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Macromolecular complexes stand at the forefront of contemporary biological 
research. They carry out most functions in the cell and outside the cell, as 
proteins have not evolved to operate in isolation [1]. In fact, it has become 
increasingly challenging to single out proteins or other macromolecules that 
can perform significant cellular functions in isolation. While proteins can act 
as monomers, such as many enzymes, they are invariably integrated within a 
dynamic network where substrate and product fluxes and allosteric regulators 
coordinate the dozens of distinct proteins that carry out a complete pathway. 
Multisubunit complexes maintain and build our genomes; read, write, and 
modify genetic and epigenetic information; execute essential metabolic 
functions; discern extracellular cues; organize signal transduction processes; 
and more. Protein and nucleoprotein complexes also comprise viruses and 
virulence factors secreted by cellular pathogens. Modern biomedicine 
leverages protein complexes for numerous therapeutic applications, ranging 
from the systems that produce them to the therapeutic proteins themselves, 
such as antibodies. In biotechnology, protein assemblies synthesize high- 
value chemical products and will one day capture sunlight and CO, as part 
of a cleaner, more sustainable economy. Protein complexes are fundamental 
to transformative processes and innovations across basic biology, biomedi- 
cine, and biotechnology. 

The first volume of Advanced Technologies for Protein Complex Produc- 
tion and Characterization (ATPCPC) originated from work carried out by the 
editor and collaborators in the context of the ComplexINC collaborative 
project (EC FP7 2011-2015). This consortium pooled leading expertise to 
pioneer novel technologies and production tools for complex protein 
biologics. That first volume provided a tour of some of the most valuable 
technologies for producing and purifying protein complexes and the methods 
routinely used for their biochemical, biophysical, and structural characteriza- 
tion [2]. Having achieved over 118,000 accesses and 249 citations, the volume 
has successfully reached a broad and diverse audience, ranging from univer- 
sity students to junior and senior scientists across academia and industry, in 
both fundamental and applied research. This second volume of Advanced 
Technologies for Protein Complex Production and Characterization 
(ATPCPC2) aspires to become a companion to the first one. In ATPCPC2, 
our focus has shifted from the expression platforms that produce complex 
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protein biologics to techniques for their identification, exploration, character- 
ization, structural determination, and modeling. ATPCPC2 covers a wide 
range of biochemical, biophysical, and structural methodologies adopted 
across laboratories worldwide, facilitating the in-depth study of protein- 
protein and protein-nucleic acid complexes for various purposes and at differ- 
ent scales. 

The driving force behind ATPCPC2 stems from the continuous advance of 
protein complexes as a central research endeavor in biology, biomedicine, and 
biotechnology, combined with the profusion of methods and techniques that 
have been developed (and continue to be developed). Since the publication of 
ATPCPC, some of these methods have led to dramatic leaps forward in our 
understanding of the structure and function of macromolecular complexes. A 
case in point is cryo-electron microscopy’s “resolution revolution” [3]. With 
this new volume, we aim to both complete and expand upon the original 
volume’s scope, with a particular emphasis on the biochemical and structural 
elucidation of protein complexes. 

This volume opens with two chapters dedicated to biochemical methods 
targeting ribonucleoprotein complexes. The first describes techniques involv- 
ing immunoprecipitation (Chap. 1), while the second describes methodologies 
for assembling and purifying RNA-protein complexes, in vitro and in vivo 
(Chap. 2). The following two chapters explore cutting-edge mass spectrome- 
try approaches to study protein complexes, including protein complex com- 
position, subunit stoichiometry, and protein-protein interactions (Chaps. 3 
and 4). Chapter 5 delves into methods to discover and investigate peptide 
linear motifs, underscoring their pivotal roles in mediating the assembly and 
regulation of protein complexes. Chapters 6 and 7 focus on biophysical 
techniques with a strong track record for characterizing macromolecular 
complexes. Specifically, Chap. 6 introduces biolayer interferometry (BLI) as 
a label-free technique to measure the kinetic parameters and affinity constants 
for biological interactions, including those involving proteins, peptides, and 
nucleic acids. Analytical ultracentrifugation (AUC) is the topic of Chap. 7, a 
versatile technique with a broad field of applications that can elucidate the 
size, shape, stoichiometry, and binding affinity of macromolecular complexes 
in solution. Lastly, Chaps. 8 through 13 provide an overview of contemporary 
applications of structural biology methodologies, spanning from nuclear 
magnetic resonance and X-ray crystallography to cutting-edge techniques 
like X-ray free electron lasers, small-angle X-ray scattering, and high- 
resolution cryo-electron microscopy. 

In the concluding remarks of the introductory volume, we posited that 
“Proteins, their complexes, and their activities are central to modern biology, 
biotechnology, and biomedicine, and their heterologous production is often a 
vital prerequisite for discovering their function in health and disease.” Within 
this volume, we have underscored state-of-the-art methodologies and 
strategies for characterizing the proteins and protein complexes that 
researchers produce. We have endeavored to incorporate up-to-date and 
authoritative contributions yet sufficiently accessible to be helpful to the 
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broadest possible audience: from students to seasoned researchers, spanning 
academic and industrial environments and bridging fundamental science to 
translational biomedicine and biotechnology. The advancements over the past 
decade in techniques for expressing, purifying, and analyzing proteins and 
their complexes have undeniably catalyzed monumental progress in the life 
sciences for the benefit of all. 


Madrid, Spain M. Cristina Vega 


Francisco J. Fernandez 
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Abstract 


Throughout their life cycle, messenger RNAs 
(mRNAs) associate with proteins to form 
ribonucleoproteins (mRNPs). Each mRNA is 
part of multiple successive mRNP complexes 
that participate in their biogenesis, cellular 
localization, translation and decay. The 
dynamic composition of mRNP complexes 
and their structural remodelling play crucial 
roles in the control of gene expression. Study- 
ing the endogenous composition of different 
mRNP complexes is a major challenge. In this 
chapter, we describe the variety of protein- 
centric immunoprecipitation methods avail- 
able for the identification of mRNP complexes 
and the requirements for their experimental 
settings. 
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1.1 Introduction 

Expression of protein-coding genes in eukaryotes 
is a complex and coordinated mechanism that 
involves many steps and relies on multiple factors 
and several molecular machineries. Cellular RNAs 
are associated with RNA-binding proteins (RBPs) 
to form ribonucleoprotein (RNP) complexes of 
highly dynamic compositions [1, 2]. Throughout 
their life cycle, mRNAs are part of multiple suc- 
cessive mRNP complexes that play critical roles in 
their biogenesis, cellular localization, translation 
and decay [3, 4]. The dynamic composition of 
mRNP complexes allows both post-transcriptional 
and specific translational regulation mechanisms 
to take place [5, 6]. Some trans-acting factors can 
interact stably with specific mRNPs but structural 
remodelling of mRNPs also involves transient 
interactions with protein and RNA chaperone 
complexes and numerous regulatory factors that 
are difficult to trap [7—9]. Studying the endogenous 
composition of different mRNP complexes is 
therefore a major challenge. The advent of new 
and high-throughput approaches has enabled 
the genome-wide determination of mRNP 
composition using techniques such as affinity 
purification-mass spectrometry (MS) as well as 
protein—RNA UV crosslinking approaches com- 
bined with RNA _ deep-sequencing (RIP-seq) 
[10-12]. Two types of approaches to study 
RNA-protein complexes have emerged: protein- 
centric and RNA-centric approaches. In protein- 
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centric approaches, mRNAs associated with RBPs 
are identified after the immunoprecipitation of the 
RBPs by microarray analysis or RNA-seq [13- 
15]. In RNA-centric approaches, RBPs are recov- 
ered by RNA pull-down methods and subse- 
quently identified by mass spectrometry (MS). 
Several independent laboratories have developed 
breakthrough methods for the mRNA interactome 
discovery. Based on protein-mRNA interactome 
capture protocols these studies now allow genome 
wide cartography of mRNPs in yeast, mammalian, 
invertebrates or plant cells, under various 
conditions [16-20]. These unbiased methods 
have emerged as powerful tools to identify new 
RBPs; they nevertheless need to be complemented 
by individual mRNA/protein detection methods in 
order to map pairwise interactions between the 
RBPs and their mRNA targets and to identify 
RBPs with overlapping target specificity. Some 
mRNA centric methods are limited to the study 
of polyadenylated mRNAs and thus exclude 
mRNA categories such as histone mRNAs that 
lack poly-A tails. Global analysis can also oversee 
low abundance mRNAs or mRNAs containing 
non-canonical codons subjected to translation 
recoding. Targeted mRNA immunoprecipitation 
experiments are particularly useful in these cases. 
In this chapter, we will focus on standard protein- 
centric immunoprecipitation methods that allow 
the identification of native mRNPs containing spe- 
cific mRNAs of interest. These methods can be 
used as initial targeted approaches or to confirm 
and consolidate high throughput or global analysis 
results. We have applied the panel of immunopre- 
cipitation methods described herein to analyse the 
implication of various chaperone complexes 
and modification enzymes that act transiently in 
the assembly and biogenesis mechanisms of 
selenoprotein mRNAs as well as the role of trans- 
lation initiation complexes in histone H4 mRNA 
translation. 


1.2 Endogenous mRNP Purification 
Using RNA 


Immunoprecipitation Methods 


mRNA form complexes with partner proteins in 
ribonucleoprotein particles (mRNP) from the 


H. Hayek et al. 


onset of their transcription. Their assembly, trans- 
port and subcellular localization and degradation 
involve interactions with numerous intermediate 
protein complexes. Protein-centric immunopre- 
cipitation methods allow to unravel the fate of 
mRNAs and their uptake by different known 
protein complexes. Various methods have been 
developed for the recovery of endogenous 
mRNPs under native physiological conditions or 
after stabilization of the interactions by chemical 
or UV cross-linking (Table 1.1). 


1.2.1 Native Immunopurification 
Native RNA immunoprecipitation (RIP) allows 
the identification of RNA-—protein interactions 
under physiological conditions by using a 
protein-specific antibody and the detection of 
interacting RNAs. mRNA interactants can be 
identified by targeted-RNA detection methods 
like quantitative Real Time-PCR (qRT-PCR), 
northern blot but also by genome-wide methods 
such as microarray (RIP-chip) or deep- 
sequencing (RIP-seq). Typical RIP experiments 
for the study of eukaryotic mRNPs are outlined in 
Fig. 1.la, mainly applying to HEK293 or HeLa 
cells. These methods preserve native complexes 
present in the cells and most often reflect in vivo 
associations. Nevertheless it was demonstrated 
that reassociation of RNA-binding proteins can 
occur after cell lysis [21]. Interactions that were 
prevented in vivo due to differential compartmen- 
talization might occur during the lysis and immu- 
noprecipitation experiment. Several protocols 
have therefore been optimized to minimize 
RNP rearrangements during the process of immu- 
noprecipitation [13, 22]. For instance, purification 
of cytoplasmic mRNPs requires mild lysis 
conditions in order to leave nuclei intact [23— 
25]. Specific conditions for nuclear mRNP have 
also been developed [26, 27]. Another important 
issue is the specificity of the interactions detected 
by these methods. Indeed, specific interactions 
that occur with low abundance mRNAs in vivo 
could be masked by the non-specific interactions 
of abundant transcripts. Several precautions need 
to be taken in the experimental procedures to 
avoid this (see below). 


1 IP Methods to Isolate mRNA Complexes 


Table 1.1 List of protein-centric methods for the purification of mRNP complexes 


Method of mRNP 
purification 


In vivo Native RIP 


RIP 


Epitope tagging 
and RIP 


Formaldehyde 
cross-linking and 
RIP 


UV cross-linking 
and RIP 


Cell free 
mRNP 
purification 


GST pull-down 


Cross-linking 


m°A Me-RIP 
m°C RIP 


Direct 
mRNA IP 


TMG-IP 


Applications 


Endogenous RNP complexes 
purification 


Confirmation of native RIP 
Analysis of low abundance target 
proteins and their target mRNA 
Efficient RNP affinity 
purification 


Capture of RNA-protein 
interactions 
Stabilization of transitory 
interactions and indirect 
RNA-binding factors 
Identification of direct mRNA- 
protein interactions 
Trapping of functional 
interactions 
Genome-wide analysis: 
PAR-CLIP 
HITS-CLIP; iCLIP; iCLAP 
CRAC 
Reconstitution and screening for 
mRNA-protein interactions 


Identification RBPs using in vitro 
transcribed mRNAs: 

4-thioU internally labelled 
mRNAs 

6-thioG capped mRNAs, 
tracking of cap-binding proteins 
Global identification of mRNA 
transcriptional modifications 
patterns 
Detection of endogenous 
m3”~’G-capped mRNAs 


Limitations 


Possible RNP rearrangements 

Possible nonspecific interactions 

Requires highly specific anti-RBP antibodies 
Possible impact of the epitope tag on the 
target protein: misfolding, loss of function, 
mislocalization 

Potential disruption of functional protein 
complexes 

Nonspecific interactions with the epitope tag 
False positives due to excess of cross-link 
False negatives due to low cross-linking 
efficiency 


False negatives due to low cross-linking 
efficiency 


Possible artefacts due to protein misfolding, 
lack of post-translational modifications 
Nonspecific interactions with the GST-tag 
Optimization of in vitro 4-thioU 
incorporation and 6-thioG cap labelling 


Specificity of the antibodies 


Low recovery levels of mRNAs possibly due 
to their abundance, stability, short-half life 


RIP RNA immunoprecipitation, PAR-CLIP photoactivatable ribonucleoside-enhanced cross-linking and immunoprecip- 
itation, H/TS-CLIP high-throughput sequencing CLIP, iCL/P individual nucleotide resolution CLIP, iCLAP individual- 
nucleotide-resolution UV-cross-linking and affinity purification, CRAC UV cross-linking and analysis of cDNA, GST 
(glutathione-S-transferase) pull-down, m°A Me-RIP anti N°-methyladenosine RIP, m°C RIP anti 5-methylcytidine RIP, 


TMG-IP anti m3”” G IP 


1.2.1.1 Key Requirements for RIP 

To prepare high-quality extracts that contain 
intact pre-mRNPs or mature mRNPs several 
factors need to be optimized. In particular extracts 
have to be kept cold and snap-frozen immediately 
after lysis and centrifugation. To limit degrada- 
tion, inhibitors of proteases and ribonucleases 
(RNases) have to be added at all steps. Extracts 
should be diluted only after thawing and during 


the immunoprecipitation steps. The total amount 
of cell extract used in RIP experiments needs to 
be adapted based on the abundance of the 
RNA-binding protein or complex targeted by the 
antibodies as well as the subsequent method of 
RNA detection. Typically, extracts generated 
from 3 to 5 x 10° mammalian cells will be 
required when the RNA is detected by qRT- 
PCR or 5 to 20 x 10° when using microarrays. 
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Fig. 1.1 Schematic representation of RNA immunopre- recovered directly from HEK293 or HeLa cell lysates (a) 
cipitation (RIP) methods for the identification of endoge- or after transfection and expression of the epitope tagged 
nous protein-mRNA interactions. Native mRNPs can be protein (b). In both cases RNAs bound to specific proteins 


1 IP Methods to Isolate mRNA Complexes 


It is essential to limit potential nonspecific 
interactions of the mRNAs of interest with the 
immunoprecipitation matrix. Various affinity 
matrices and supports can be used successfully 
for the recovery of mRNP complexes after incu- 
bation of the extracts with a specific antibody. 
These include protein A-sepharose, agarose 
but also wide range of magnetic beads. It is advis- 
able to compare different matrices because the 
nonspecific binding of a given mRNA can vary 
depending on the immunoprecipitation support. It 
is also recommended to pre-clear the extracts in 
contact with non-coated beads before immuno- 
precipitation and to dilute the lysate in order to 
remove nonspecific binders. When the targeted 
mRNA will subsequently be detected by methods 
such as qRT-PCR, pre-coating of the beads with 
purified BSA and yeast total tRNA can consider- 
ably reduce nonspecific background (Fig. 1.1). 
For standard RIP, extensive washing of the 
immunoprecipitated pellet is required and the 
stringency of the washing buffers needs to be 
adapted to the stability of the RNP complex 
analysed. The mRNP is ultimately released 
and dissociated into RNA and proteins. This is 
mostly performed under denaturing conditions. 
mRNP affinity purification can also be performed 
using anti-peptide antibodies against the target 
protein. In this case bound mRNP complexes 
can be eluted in native conditions by the 
peptide used to generate the antibodies. This is 
particularly interesting when studying low abun- 
dance mRNPs. This method allowed us to 
co-immunoprecipitate selenoprotein mRNAs 
and protein complexes associated to SECIS bind- 
ing protein 2 (SBP2), a key protein that interacts 
with all selenoprotein mRNAs and recruits 
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translation and assembly factors to the mRNP 
[28, 29]. In any case, validation steps of the RIP 
will include verification of optimal protein pull- 
down by Western blot and evaluation of the integ- 
rity of protein complexes associated to the mRNP 
by standard mass spectrometry. 

1.2.1.2 mRNA Identification 

and Validation of the RIP 
Experiment 

After release of the RNP components, the ulti- 
mate goal is to isolate and characterize the mRNA 
from the immunoprecipitated pellet (Fig. 1.1). For 
proper interpretation of the data it is essential to 
compare the RIP result to a negative control 
(IP with pre-immune serum or non-RBP protein 
target). For standard RIP, qRT-PCR remains the 
method of choice to measure RNA levels as it 
allows to normalize the immunoprecipitated 
mRNA to its abundance in total lysate [30]. To 
determine if the interaction of given RBP can be 
generalized to an entire family of mRNAs the 
microarray analysis of the RIP content is a useful 
strategy. Result differences can be observed 
between the two detection methods, most likely 
because qRT-PCR is more sensitive and specific 
than microarrays for the detection of the endoge- 
nous population of mRNAs [31]. Deep- 
sequencing analysis of RIP experiments allows 
the genome-wide characterization of RNAs 
bound to a given RBP and the mapping of 
protein-RNA regulatory networks using appropri- 
ate bioinformatics tools [32, 33]. mRNA 
interactants identified using native purification 
methods often require further experimental vali- 
dation using multiple distinct experimental 
approaches. A way to confirm the validity of the 


Fig. 1.1 (continued) of interest are immunoprecipitated 
from cell lysates in native conditions by using either a 
protein-specific antibody (a) or anti-tag antibody (b). 
After several washing steps the RNA-protein complexes 
(RNPs) are recovered by elution in either denaturing or 
native conditions. The validity of the immunoprecipitation 
experiment is verified by western blot. The RNA is 
extracted and interacting mRNAs are identified by targeted 


RNA detection methods (qRT-PCR, microarray) or global 
high throughput sequencing analysis (RNA-seq). The 
insert represents the preparation of the affinity matrix 
before immunoprecipitation. This includes pre-coating of 
the beads with purified BSA and yeast total tRNA to 
reduce nonspecific background before binding the 
antibodies 


mRNA interactant is to compare the RIP results 
obtained against an endogenous protein 
under normal and RNAi inhibition of the target 
protein [34]. Alternative immunoprecipitation 
approaches based on epitope tagging or cross- 
linking methods can be used, and they will be 
discussed below. Functional validation of the 
mRNA-protein interactions unveiled by the RIP 
experiment will have to include a wide panel of 
methodologies such as subcellular localization 
experiments (FISH) [35], yeast-two hybrid inter- 
action tests (Y2H) [36], electrophoretic mobility 
shift assays (EMSA) [37, 38], and more. 


1.2.2 Epitope Tagging and RIP 

The use of an epitope-tagged RBP for mRNP 
isolation is a useful complementary approach to 
confirm endogenous RIP data. It can also be used 
when the endogenous target protein is not 
detected by standard RIP because of its low 
cellular abundance or the inaccessibility of the 
epitope to the immunoprecipitating antibody 
(Fig. 1.1b). After transfection and expression of 
the epitope tagged protein in cells, usually 
HEK293 or HeLa, mRNAs are coprecipitated 
using antibodies directed against the epitope tag. 
Some of the most commonly used epitope tags 
include FLAG, HA, His, Myc, GST, GFP and V5 
[39, 40]. Limitations linked to the use of epitope 
tags are the possibility for the tag to impact the 
folding of the fusion protein, to disrupt functional 
protein complexes and to interact non-specifically 
with cellular proteins. Large globular tags may 
also affect the function and subcellular localiza- 
tion of the fusion protein. Among various tags the 
green fluorescent protein (GFP) has proven par- 
ticularly efficient for affinity purification of RNP 
complexes and proteomic analysis, despite its 
size, because it shows minimal nonspecific bind- 
ing to mammalian cell proteins [41]. We have 
used epitope tagging successfully to analyse the 
contribution of different chaperone complexes and 
modification enzymes in selenoprotein mRNA 
assembly and translation. This allowed us to 
demonstrate interactions between selenoprotein 
mRNAs and components of the HSP90 chaperone 
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complex [28]. Our findings also establish that a 
FLAG-tagged version of  trimethylguanosine 
synthase 1 (Tgsl) interacts with selenoprotein 
mRNAs for cap hypermethylation [31]. We 
also showed that the ubiquitous Survival of 
MotoNeurons (SMN) chaperone and the 
methylosome complex devoted to sn- and snoRNP 
maturation contribute to recruiting Tgsl to 
selenoprotein mRNPs [29]. In this case we used 
GFP-tagging as a method of choice to decipher 
the role of individual SMN and methylosome 
components in selenoprotein mRNP assembly 
and translation. 


1.2.3 Formaldehyde Cross-linking 


and RIP 


When studying RNA-protein interactions in vivo 
it is essential to minimize RNP reassortments that 
can occur during cell disruption. Chemical 
treatments can be employed for this purpose in 
order to cross-link proteins with their associated 
mRNAs prior to the cell lysis and immunoprecip- 
itation steps (Fig. 1.2a). Formaldehyde is a useful 
reversible crosslinking agent capable of capturing 
interactions between various macromolecules 
mainly by formation of covalent bonds between 
amino groups that are in close proximity (around 
2 A) [42]. It has been used extensively as a 
fixative to maintain the structural integrity of 
cells and in chromatin immunoprecipitation 
(ChIP) experiments to study DNA-protein 
interactions and cellular networks [43, 44], but 
also to delineate RNA-protein and protein-protein 
interactions [45]. Formaldehyde cross-linking is 
particularly useful to detect transitory interactions 
and to stabilize indirect RNA-binding factors that 
are part of higher-order RNPs. It was used success- 
fully in various studies including the identification 
of RNAs binding to the CTD of RNA polymerase 
II, the binding of the histone acetyltransferase 
Elongator to nascent mRNAs [46, 47] as well as 
the interaction of eukaryotic Translation Initiation 
Factor 3 (eIF3) with various mRNAs [48]. The 
success of the RIP assay depends on the degree 
of cross-linking. The cross-linking procedure 
therefore needs to be optimized to avoid false 
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Fig. 1.2 Outline of chemical and UV cross-linking 
immunoprecipitation methods. (a) Formaldehyde 
(FA) cross-linking. (b) UV cross-linking methods coupled 
to high-throughput analysis are represented from 
left to right: photoactivatable ribonucleoside-enhanced 
crosslinking and immunoprecipitation (PAR-CLIP), 
High-throughput sequencing CLIP (HITS-CLIP) and indi- 
vidual-nucleotide resolution CLIP (iCLIP), individual 
nucleotide crosslinking and cDNA analysis (CRAC). 
PAR-CLIP uses 4-thioU ribonucleosides; UV at 365 nm 


positive interactions and to achieve optimal 
RNA-protein complex recovery. The concentra- 
tion of formaldehyde and the duration of cross- 
linking need to be adjusted [49]. Optimal formal- 
dehyde concentration typically ranges between 
0.1% and 1.0% and the duration of fixation varies 
between 5 min and 1 h. Cross-linking reactions are 
quenched by the addition of glycine (pH 7). Inter- 
estingly formaldehyde crosslinking can be 
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induces zero length cross-links between the 
photoactivatable 4-thioU moiety of the RNA and 
RNA-binding proteins (RBPs), while the other three 
methods utilize UV cross-linking at 254 nm. Isolation of 
RNP complexes is achieved by immunoprecipitation 
(IP) or double affinity purification by IP and immobilized 
metal ion affinity chromatography (IMAC) in the case of 
CRAC. FA represents formaldehyde and SU and 
represents 4-thioU nucleotides respectively. Yellow dots 
symbolize cross-links between RNA and proteins 


reversed by incubation of the IP pellets at 70 °C 
for the characterization of immunoprecipitated 
components. 


1.2.4 UV Cross-linking and RIP 


Unlike formaldehyde cross-linking, which is able 
to capture interactions with protein complexes, 


UV cross-linking exclusively identifies direct 
RNA-protein interactions. UV irradiation at 
254 nm generates short-lived radicals that react 
with nucleic acids and amino acids such as lysine, 
cysteine, phenylalanine, tryptophan and tyrosine 
located in close proximity (zero distance) forming 
covalent bonds [50]. In vivo UV cross-linking 
thus allows trapping functional protein-RNAs 
interactions. Interestingly it can be associated to 
denaturing purification methods for the identifica- 
tion of the interacting RNA and has given rise to 
the cross-linking and immunoprecipitation 
(CLIP) technique [11, 12, 51, 52]. This technique 
can be used for targeted RNA identification but 
has mainly been associated to high-throughput 
sequencing of cDNA library (HITS-CLIP). Sev- 
eral powerful variants of the methodology exist 
such as PhotoActivatable Ribonucleoside 
enhanced CLIP (PAR-CLIP) and individual- 
nucleotide resolution CLIP (iCLIP) that allow 
the identification of cross-linking sites at a 
single-nucleotide resolution [14, 15, 53]. 
PAR-CLIP employs the nucleotide analogue 
4-thiouridine (4-thioU), which can be added to 
the growth medium and is taken up by cultured 
cells and incorporated into newly-synthesized 
RNAs [15]. 4-thioU is activated by 365-nm UV 
irradiation, instead of UV 254 nm used in CLIP, 
to generate covalent crosslinks between proteins 
and RNAs. In both CLIP and PAR-CLIP 
experiments the RNA is fragmented by RNase 
treatment and immunoprecipitation allows the 
recovery of co-precipitated RNA fragments. 
Other variations make use of double epitope- 
tagged RBPs in cross-linking experiments 
followed by purification of the RNPs by tandem 
affinity purification (CLAP) or immunoprecipi- 
tation coupled to immobilized metal-ion affinity 
chromatography (IMAC) in the CRAC method 
(54, 55]. Some in vivo methods combine UV 
cross-linking, GFP-based immunoprecipitation 
and quantification of co-isolated polyadenylated 
[poly(A)] RNAs with fluorescent oligo 
(DT) probes [56]. The main high-throughput 
methods are summarized in Fig. 1.2b and 
Table 1.1. 


H. Hayek et al. 


1.3  Cell-Free Extract mRNP 
Immunoprecipitation Methods 
1.3.1 GST Pull-Down Experiments 


Using Total RNA from Cell 
Extracts 


An alternative approach to endogenous pull- 
down is the exogenous expression of epitope- 
tagged recombinant proteins that can be 
incubated with cell extracts for the reconstitution 
of RNA-protein interactions. Purified recombi- 
nant proteins carrying a glutathione-S-transferase 
(GST) tag have been widely used as a screening 
technique for the identification of protein-protein 
interactions. GST-tagged proteins are also effi- 
cient tools to uncover protein-mRNA interactions 
when incubated with total RNAs isolated from 
cell extracts, native extracts or rabbit reticulocyte 
lysates. The GST-tagged proteins can be effi- 
ciently recovered from cell extracts using either 
affinity matrices such as Glutathione sepharose, 
Glutathione magnetic beads or anti-GST 
antibodies. The co-precipitated RNA can be sub- 
sequently analyzed by qRT-PCR to detect the 
presence of specific target mRNAs using gene- 
specific primer pairs (Fig. 1.3a and Table 1.1). 
Glutathione-S-transferase (GST) pull-down 
procedures have been developed for the purifica- 
tion of eukaryotic mRNAs using a mutant version 
of the mRNA 5’ cap-binding protein (eIF4E) with 
increased affinity for the m’GTP moiety of the 
cap [57]. Using this method we could demon- 
strate that several selenoprotein mRNAs are not 
recognized efficiently by translation initiation fac- 
tor eIF4E and have an alternative cap 
structure [31]. 


1.3.2 Crosslinking of Total Cell 
Extracts with /n Vitro 
Transcribed 4-thioU or 6-thioG 


Labelled mRNAs 


The 4-thiouridine PAR-CLIP approach can be 
adapted to study RNA-protein interactions using 
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Fig. 1.3 Cell-free extract mRNP immunoprecipitation 
methods. (a) GST pull-down. An exogenous GST-tagged 
recombinant protein produced in E. coli is incubated with 
cell extracts to reconstitute RNA-protein interactions. 
After affinity purification using Glutathione sepharose 
beads, the co-precipitated mRNA can be identified by 
target mRNA identification methods such as qRT-PCR. 
(b) Illustration of cross-linking methods using exogenous 
4-thioU or 6-thioG labelled mRNAs. 4-thioU and **P 


an exogenous in vitro transcribed *P-labelled 
mRNA containing 4-thioU nucleotides that is 
incubated in the presence of total cell extracts 
(Fig. 1.3b). After UV-cross linking at 365 nm, 
the crosslinked RNAs is digested by RNase T1 
or A [58]. Alternatively in vitro transcribed 
*°P_labelled mRNA devoid of 4-thioU can also 
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labelled transcripts or 6-thioG/*?P cap modified mRNA 
transcripts are incubated with cell free extracts and cross- 
linked to RBPs at 365 nm. Partial RNase digestion allows 
transferring the fragments of radiolabelled mRNA to the 
interacting protein. After immunoprecipitation compari- 
son of Western blot and Phosphorimager analysis of an 
SDS-gels allows to identify the 5’-**P-labeled protein. SU 
and SG represent 4-thioU and 6-thioG nucleotides respec- 
tively. Cross-links are illustrated as in Fig. 1.2 


be UV cross-linked directly to proteins from cell 
extracts at 254 nm. Subsequent immunoprecipita- 
tion and separation by denaturing gel electropho- 
resis allows resolving RNase-protected 
3?P_labelled mRNA fragments cross-linked to 
specific RNA binding proteins. Overlay of 
Western-blot and autoradiography analysis of 
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the SDS-PAGE gel can allow the detection of 
pairwise protein-RNA interactions. Critical to 
the success of these experiments is the prepara- 
tion of the in vitro 4-thioU labelled RNA, in 
particular the optimization of the 4-thioUTP: 
UTP ratio during transcription [59, 60]. UV 
crosslinking using mRNAs containing 
4-thiouridine have allowed the characterization 
RNA-protein interactions between various 
mRNAs and ribosomal proteins as well as numer- 
ous translation initiation factors including eIF3, 
eIF4GII [61-63]. These experiments have also 
revealed the mRNA binding function of 
Gemin5, a component of the SMN chaperone 
complex, and its role in translation [64]. We 
have recently developed UV and chemical 
crosslinking methods specifically dedicated to 
track proteins bound to the 5’ cap structure of an 
mRNA of interest during the translation process 
in cell-free extracts [65]. Site-specific 
incorporation of 6-thioG can be achieved after 
in vitro transcription of the mRNA at the m’G 
capping step. The modified cap is added post- 
transcriptionally by the Vaccine Capping Enzyme 
(VCE). At this stage 6-thioGTP (s6G) is 
substituted to the usual GTP, to generate an 
m/’s6G-cap structure. UV cross-linking can be 
achieved at 365 nm (Fig. 1.3b). Chemical 
crosslinking of the 5’ cap of mRNAs is an alter- 
native approach to study RNA binding proteins 
such as initiation factors, it was initially used to 
identify the cap-binding protein eIF4E [66]. The 
target mRNA transcripts containing a radioac- 
tively labelled m’[??P]G(5’)pppG cap, is oxidized 
with sodium periodate and incubated in cell-free 
extracts leading to the formation of Schiff bases 
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between the 5’ cap and amino groups of 
interacting proteins [65]. Subsequent RNase A 
digestion allows to leave the radiolabelled cap 
covalently linked to its protein partners that can 
be identified directly by Western blot or after 
immunoprecipitation. These methods allowed us 
to follow the histone H4 mRNA cap binding 
proteins during H4 mRNA translation with the 
use of specific translation inhibitors for different 
steps [65, 67]. 


Direct mRNA IP: Anti-m°A, m°C 
and Anti-TMG IP 


1.4 


While mRNP assembly and remodelling is essen- 
tial to gene expression control, mRNA post- 
transcriptional modifications have emerged as an 
additional layer of gene expression regulation 
[68]. Different types of nucleotide modification 
have been documented in eukaryotic mRNAs. 
These include internal mRNA modifications 
such as 5-methylcytosine (mî C) [69], and 
N°-methyladenosine (m°A) [70], that play regu- 
latory roles, but also m’G and m,”””G 5’ cap 
modifications. Mammalian mRNAs, synthesized 
by RNA polymerase II (polll), are generally 
characterized by the presence of a 
7-methylguanosine (m’G) cap structure at their 
5’ end [71]. The m’G cap required for mRNA 
processing, translation initiation, mRNA transport, 
splicing and degradation [72-77]. We showed that 
several selenoprotein mRNAs bear a trimethylated 
m3””"’G cap (Fig. 1.4a) and are thus subjected to a 
non-conventional translation initiation mechanism 
[31]. Only a limited number of mRNA 


Fig. 1.4 (continued) polysome fractions. Cytoplasmic 
extracts are fractionated onto 7—47% (w/v) linear sucrose 
gradient after cycloheximide treatment in order to block 
translation elongation. A typical absorbance profile is 
represented in the top panel and the positions of the 
polysomes, 80S ribosome as well as free RNAs are 
indicated. The middle panel represents relative mRNA 
abundance in each fraction measured by qRT-PCR in the 
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case of the GPx1 selenoprotein mRNA and a control 
mRNA. Vertical bars mark the position of the polysome 
as well as RNP and free fractions that are pooled and can 
be analysed by TMG-IP. The amount of RNA 
immunoprecipitated from the polysome and RNP pools 
are measured separately by qRT-PCR and normalized to 
100%. Error bars represent standard deviations of an aver- 
age of two independent experiments 
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Fig. 1.4 Anti-m3~*’G cap immunoprecipitation (TMG- 
IP) experiment. (b) Structure of the m;"""’G cap. Com- 
pared to the canonical m’G cap, two additional methyl 
groups are present on the exocyclic N2; they are 
represented in red. (b) TMG-IP workflow. RNA extracted 
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from cellular extracts is immunoprecipitated with anti- 
TMG serum (anti-m;7”"’G Ab,R1131). The bound RNA 
is analysed by qRT-PCRQuantitative Real Time-PCR 
(qRT-PCR). (c) Polysome analysis of the distribution 
of TMG-capped mRNAs in ribosome bound and free 
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immunoprecipitation methods allow to directly 
recover endogenous mRNAs using antibodies 
directed against specific nucleotide modification; 
these include detection of m,””’G-cap, m°A and 
m°C (Table 1.1). 


1.4.1 m°A Me and m°C RIP 
N°-methyladenosine (m°A) is the most prevalent 
internal (non-cap) modification present in the 
messenger RNA of higher eukaryotes [78- 
80]. This RNA methylation is reversible and 
may dynamically control mRNA metabolism. 
RIP methods called anti-m°A Me-RIP [81] or 
m°A-seq [82], based on antibody-mediated cap- 
ture followed by either qRT-PCR or massively 
parallel sequencing have been developed. They 
allowed the identification of mA modified 
mRNAs [83] and transcriptomes (methylome) in 
human cells and mouse tissues and revealed m°A 
enrichments within long exons and around stop 
codons [84, 85], further suggesting fundamental 
regulatory roles of m°A. 5-methylcytidine (m°C) 
is another mRNA modification that plays a role in 
gene regulation [86, 87]. Anti-m°C RIP is based 
on the use of monoclonal antibodies that specifi- 
cally bind 5-methylcytosine [88] and that have 
been broadly used in pull-down experiments of 
modified DNA molecules. These antibodies were 
raised against 5-methylcytosine nucleotide conju- 
gated to ovalbumin without the ribose or deoxy- 
ribose sugar and are blind to the DNA/RNA 
context. Both m°A Me-RIP and m°C RIP 
methods appear to be better suited for establishing 
global mRNA modification patterns rather than 
for the specific recovery of targeted mRNAs, 
and they are usually associated to other 
epitranscriptome characterization technologies 
(for a review see [89]). 


1.4.2 Trimethylguanosine-Capped 
mRNA Immunoprecipitation 


(TMG-IP) 


We have designed immunoprecipitation methods 
for the detection of endogenous m3””’G-capped 
selenoprotein mRNAs from total cell extracts 
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or after polysome fractionation of cytoplasmic 
extracts [90]. The detection of the 
m3””"’G-capped RNAs can be performed by 
immunoprecipitation experiments using a highly 
specific anti-m3~”"’G cap R1131 serum (or anti- 
TMG serum) that was demonstrated not to recog- 
nize monomethylated caps [91, 92]. On the 
contrary, antibodies developed against m’G-cap 
recognize both m’G and m;~*’G modifications. 
Experiments based on the use of anti- TMG serum 
are called TMG-IP and have been used success- 
fully to characterize the m3””"’G-capped small 
non-coding RNAs such as snRNAs and 
snoRNAs. We have adapted this assay coupled 
to real time quantitative PCR for the detection 
of endogenous m3~”’G-capped selenoprotein 
mRNAs in vivo from cultured cells (Fig. 1.4b) 
[31, 90]. The fraction of the m;~~’G-capped 
selenoprotein mRNAs recovered by TMG-IP 
does not exceed 5—15%, whereas this level can 
reach close to 100% for m3”’G-capped 
snoRNAs. Determinants such as the larger size 
and lower abundance as well as stability and 
shorter half-life of mRNAs compared to 
non-coding RNAs may contribute to their lower 
recovery in the TMG-IP. The contribution of 
potential mRNA folded structures in the 5’UTR 
must also be taken into consideration in the rec- 
ognition processes of individual mRNAs by the 
antibody. The TMG-IP method can also be 
applied to evaluate the ability of m3””’G-capped 
mRNAs to associate with actively translating 
ribosomes and to recover mRNAs after polysome 
fractionation of cytoplasmic mRNAs (Fig. 1.4c). 
To this end cytoplasmic extracts must be recov- 
ered from cycloheximide-treated cells in order 
to block the translation elongation step and 
fractionated on linear 7—47% sucrose gradients. 
TMG-IP experiments can then be performed on 
pooled fractions that contain either free or 
non-polysome associated RNAs. This allows to 
estimate the ratio of ribosome bound compared to 
free m;~~’G-capped mRNAs [31]. 


1.5 Conclusion 


Protein-RNA interactions and their modularity 
play a central role in the mRNA fate including 
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their assembly, modification, transport, translation 
and degradation. While many methods have been 
developed to examine mRNP complexes, there are 
still significant challenges that need to be 
addressed. The precise nature of the protein 
complexes that interact with most RNAs in the 
cell is still poorly understood and exploring low 
abundance transcripts remains a difficult task. The 
advent of high-throughput methods as well as new 
computational approaches to address protein—pro- 
tein and protein-RNA networks is transforming 
our understanding of mRNP architecture as well 
as biology. In this chapter we have presented pro- 
tein-centric techniques designed to reveal and val- 
idate targeted mRNA-protein interactions. 
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Abstract 


Throughout their entire life cycle, RNAs are 
associated with RNA-binding proteins (RBPs), 
forming ribonucleoprotein (RNP) complexes 
with highly dynamic compositions and very 
diverse functions in RNA metabolism, including 
splicing, translational regulation, ribosome 
assembly. Many RNPs_ remain poorly 
characterized due to the challenges inherent in 
their purification and subsequent biochemical 
characterization. Therefore, developing methods 
to isolate specific RNA-protein complexes is an 
important initial step toward understanding their 
function. Many elegant methodologies have 
been developed to isolate RNPs. This chapter 
describes different approaches and methods 
devised for RNA-specific purification of a target 
RNP. We focused on general methods for 
selecting RNPs that target a given RNA under 
conditions favourable for the copurification of 
associated factors including RNAs and protein 
components of the RNP. 
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2.1 Introduction: Issues 
and Challenges of RNP 


Purifications 


The use of small, genetically introduced affinity 
tags for the production and purification of recom- 
binant proteins and their complexes with other 
proteins has greatly improved our knowledge on 
their function in a variety of research fields. The 
usual protein affinity tags include polyhistidine 
(poly-His), the hemagglutinin epitope (HA-tag), 
myc epitope, glutathione S-transferase (GST-tag), 
Maltose-Binding Protein (MBP-tag), Strep-tag, 
FLAG epitope (Flag-tag), protein A, etc. These 
affinity tags all bind with high affinity to a ligand 
that can be immobilized on a chromatography 
resin for further purification. In most cases, the 
bound complexes can either be released from the 
resin by competitive elution or cleaved off by a 
protease with a recognition site incorporated in the 
fusion protein. The widespread use and success of 
protein affinity tags has led to the development of 
comparable tags for nucleic acids leading to new 
RNA-centric methods for affinity-purification of 
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RNA-binding proteins (RBDs) or ribonucleo- 
particles (RNPs). RNA-centric methods target 
RBPs or RNPs binding to a single RNA of interest. 
The majority of the existing methods use tagged 
RNAs as baits to capture and study complexes 
bound to it. During these last years, particular 
emphasis has been placed on developing new 
affinity tags for RNA molecules. Between the 
early studies in the 90 s and today’s methods, a 
large variety of diverse RNA tags have been 
described with various applications including 
functional analyses, in addition to mass-spectrom- 
etry, cell imaging, or three-dimensional structure 
determination. 

Different approaches have been used to tag 
RNA molecules. (i) RNAs can be chemically 
tagged during in vitro transcription through the 
incorporation of modified ribonucleotides that 
contain biotin, fluorescent dyes, or other 
compounds. (ii) Well-characterized protein-bind- 
ing RNA sequences can be incorporated during 
in vitro or in vivo transcription. Such natural 
sequences might be further optimized in size and 
affinity. (iii) Similarly, artificially selected RNA 
aptamers can be incorporated during transcription. 
Because of their small sizes, such aptamer tags are 
easy to insert into the RNA of interest. The tagged 
RNAs and assembled RNPs are subsequently 
purified by affinity-purification on an adapted 
resin. The limiting step often consists in the elution 
of the assembled complexes in native conditions 
since the binding affinities are typically high. 
(iv) Hybridization of biotinylated oligonucleotides 
that are complementary to the RNA target is an 
efficient purification strategy. It requires accessible 
single-stranded regions and elution can be 
achieved under denaturing conditions, or displace- 
ment by a competitor oligonucleotide, or by 
targeted RNase degradation. (v) Alternatively, the 
RNA sequence of interest may be ligated to a 
biotinylated-DNA oligonucleotide in order to 
yield chimeric RNA-DNA molecules that can effi- 
ciently bind streptavidin beads. Benefits of this 
procedure include the absence of constraints 
resulting from the use of a foreign structured tag. 
In addition, the elution is performed in native 
conditions by DNase or targeted RNase degrada- 
tion that releases the RNA-protein complexes 
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without any extra sequence. Each of these methods 
has particular advantages and disadvantages for 
the tagging or affinity isolation or elution of 
RNPs. They will be discussed hereafter. 


2.2 First Affinity Purification 


Methods 


Two early studies performed in the 90 s established 
the foundations of affinity purification based 
on natural RNA regulatory sequence as bait 
[1, 2]. The studies focused on the Iron Responsive 
Element (IRE) that binds the Iron Responsive Ele- 
ment Binding-Protein (IRE-BP) involved in iron 
homeostasis. In the first study, the RNA transcript 
containing the regulatory sequences upstream of a 
biotinylated unstructured tail was added in solution 
to unfractionated lysates containing the protein 
target. Then, the high affinity interaction between 
biotin and avidin was used to attach the IRE/IRE- 
BP to a solid matrix. For that, the assembled 
mRNA-protein complexes were bound to biotin- 
agarose beads through a succinylated-avidin inter- 
mediate (Fig. 2.la). IRE-BP was subsequently 
eluted from the RNA with 2 M KCI [1]. In the 
second study, in vitro transcribed polyadenylated 
RNA was hybridized to a poly(U)-Sepharose resin 
and incubated with pre-purified cellular extracts. 
The RNA-protein complexes were assembled, and 
the adsorbed proteins were recovered by elution 
with 1 M KCI [2] (Fig. 2.1b). Although these two 
seminal studies reached the mRNP assembly on a 
specific RNA target, the complex was dissociated 
during the high-salt elution. Nevertheless, they 
successfully tested two RNA-tagging procedures. 
The first was using biotinylated nucleotides 
inserted in the RNA for binding to the beads, 
and the second procedure took advantage of 
the RNA-RNA hybridization to immobilize 
RNA-protein complexes on a solid matrix. 

Over the same period, a more general affinity 
purification method that allowed the isolation of 
specific RNAs and RNA-protein complexes was 
described [3]. The approach, illustrated in 
Fig. 2.lc, hinges on the specific interaction 
between the bacteriophage R17 coat protein and 
a short hairpin found in its genomic RNA. In this 
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Fig. 2.1 Outline of the three methods of purification of a 
specific RNA protein complex from a mixture of proteins. 
(a) Schematic representation of the affinity purification 
of IRE-BP based on Biotin-Avidin affinity. The allylamine 
uridine triphosphate was randomly incorporated 
throughout the entire transcript before coupling with 
succinimidyl-biotin. Bound IRE-BP was subsequently 
eluted with 2 M KCI. IRE, Iron Responsive Element; 
IRE-BP, IRE-Binding protein. (b) Representation of the 
affinity purification of IRE-BP based on polyU resin. In 
vitro transcribed polyadenylated RNA was bound to poly 
(U)-Sepharose. Prepurified IRE-BP was specifically 
adsorbed to the IRE and subsequently eluted with 1 M 


strategy, the R17 coat protein (CP) was cova- 
lently attached to beads to retain RNA sequences 
containing the short hairpin sequence. To 
improve the interaction between the CP and 
RNA hairpin, a high affinity variant of the RNA 
hairpin was used (K, = 3.5 x 10'° M [4]). A 
hybrid RNA containing two R17 recognition 
sites fused to the RNA sequence of interest was 
prepared either in vitro, by transcription with a 
phage polymerase, or in vivo, by cellular 


poly-adenylated IRE element 


1 M KCI elution of IRE-BP 


R17 coat binding sites 


In vivo or in vitro 
transcribed 


+ Pre-purified 


Elution of RNP with synthetic 
R17 binding site 


KCI. The IRE here shown is one of the transferrin IRE 
elements. (c) Affinity purification of snRNP U1 based on 
the affinity of R17 coat protein for its RNA-binding ele- 
ment. A chimeric RNA containing two R17 recognition 
sites inserted in Ul RNA was prepared either in vitro, by 
transcription with a phage polymerase, or in vivo, by 
cellular transcription of a transfected or injected DNA 
template. The chimeric RNA binds to appropriate factors 
in the cell extract. The resulting complexes are selectively 
retained on a support to which R17 coat protein has been 
covalently coupled. Specific RNA molecules and any 
associated factors are eluted with an excess of R17 recog- 
nition elements 


transcription of a transfected or injected DNA 
template. After incubating the chimeric RNA 
with appropriate factors in the cell or lysate, the 
resulting RNA-factor complexes were retained on 
the resin coupled to the R17 coat protein. The 
binding of RNAs to the beads turned out to be 
rapid, efficient, and highly selective. After eluting 
the column with an excess of R17 recognition 
sites, the authors could obtain biologically active 
factors and complexes of interest [4]. 
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2.3 = Affinity Purification Based 
on Artificially Selected 


or Natural RNA Motifs 


Over the years, a large variety of aptamer-based 
affinity purification approaches have been devel- 
oped for the isolation of in vitro-assembled 
RNA-protein complexes. These include the use 
of StreptoTag aptamers [5], streptavidin aptamer 
[6], Sephadex aptamer [7], tobramycin aptamers 
[8], and Mango aptamer [9]. Other natural RNA 
elements have been used such as the bacterio- 
phage R17 MS2 sequence (binding site of the 
coat protein CP), or the equivalent PP7 RNA 
from Pseudomonas aeruginosa bacteriophage 
PP7 [10-13]. 


2.3.1 = Antibiotic-Binding RNA 


Aptamers 


One of the first methods for affinity-purification 
was based on the RNA sequence named 
StreptoTag aptamer [5]. This 48-nt long RNA 
element binds with high binding specificity to the 
antibiotic streptomycin. It was isolated via SELEX 
and binds to the antibiotic with a dissociation 
constant (Kd) of around 1 uM (Fig. 2.2a). The 
StreptoTag was inserted by in vitro transcription 
in a hybrid RNA and was incubated with crude 
extract. The assembly reaction was then applied to 
a dihydrostreptomycin-Sepharose column. The 
resulting bound complexes were washed and spe- 
cifically eluted upon addition of free streptomycin. 
Using this purification scheme, properly assem- 
bled spliceosomal U1A protein and the bacterio- 
phage MS2 coat protein could be isolated via their 
appropriate RNA motif [5] as well as 48S ribo- 
somal complexes assembled on an IRES element 
in milligram quantities [14]. The method turned 
out to be more specific than the procedures that use 
poly-U Sepharose [2] or biotin/streptavidin beads 
[1] that often copurify unspecific proteins [5]. One 
of the drawbacks of the method consists in the 
preparation of the dihydrostreptomycin-Sepharose 
column. Although the protocol for coupling strep- 
tomycin to Sepharose 6B is well established, it is a 
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time-consuming process, and the coupled matrix 
can only be stored for a maximum of 4 weeks. 
In addition, the quality of the matrix might be 
subjected to batch-to-batch variation. 

A similar affinity-purification procedure was 
developed using an RNA aptamer selected to bind 
with 5 nM affinity the aminoglycoside antibiotic 
tobramycin [15, 16] (Fig. 2.2a). The tobramycin- 
binding aptamer was used to purify preparative 
amounts of human pre-spliceosome complex. The 
method turned out to be ideally suited to isolate 
high molecular weight complexes in their native 
form and for functional and structural studies 
[8, 16]. However, the drawbacks of the method 
are similar to those of the StreptoTag and mainly 
concern the preparation and preservation of the 
tobramycin-coupled Sepharose support. 


2.3.2 Streptavidin and Sephadex 


Aptamers 


The S1 Streptavidin-RNA binding aptamer was 
selected based on a tight binding to a commonly 
available target molecule in such a way that the 
ligand—RNP complex can be selectively and gently 
dissociated afterwards. The S1 Streptavidin tag 
binds specifically to streptavidin and can be eluted 
by competition with biotin. Streptavidin is a protein 
produced by Streptomyces avidinii, which binds to 
biotin with extraordinary high affinity. The Ka for 
the biotin interaction is one of the strongest 
non-covalent biological interactions found in 
nature (10~'* M). Streptavidin is commercially 
available in many forms, either as a purified protein 
or conjugated with enzymes, dyes, or many 
supporting matrices. The Streptavidin aptamer, as 
an affinity tag (Fig. 2.2a), inserted into the RNA 
component of the large RNA subunit of RNase P, 
was successfully used to purify the active form of 
the ribonucleoprotein RNase P [6]. The S1 aptamer 
was further used for the isolation of an mRNP 
involved in translation activation [17] and purifica- 
tion of mRNA-interacting proteins from human 
cells and Drosophila embryo extracts using 
in vitro transcribed RNAs attached to streptavidin 
via the S1 tag [18, 19]. It was also shown that the 
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Fig. 2.2 Strategies of purification of RNPs based on artifi- 
cial RNA aptamers or natural tags. (a) Sequences and 
predicted secondary structures of different RNA aptamers 
that have been used to develop purification strategies. The 
streptomycin RNA aptamer sequence (StreptoTag) binds 
with high affinity to dihydrostreptomycin coupled to a 
sepharose column matrix. The tobramycin aptamer binds 
efficiently to tobramycin-derivatized sepharose beads. The 
streptavidin aptamer (S1) specifically binds to streptavidin 
conjugated to many commercial supports. S1 was 
optimized in SIm that can be repeated (4xS1m). Elution 
is obtained under mild conditions by adding biotin. The 
Sephadex aptamer is specific to the Sephadex matrix. Elu- 
tion is performed by dextran B512, which is the base 
material of Sephadex. (b) Several natural sequences from 
bacteriophages have been used as tags to affinity-purify 
RNPs. The MS2 hairpin is the most frequently used, with 
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a variable number of copies in the RNA target. (c) Since the 
affinity of the repeated MS2 sequence for the coat protein 
(CP) is too high for an efficient elution, several fusion 
proteins have been engineered in order to affinity-bind the 
MS2 tag on one side (CP) and additional peptides with 
different binding properties. The Maltose-Binding Peptide 
(MBP) binds to amylose columns and release of the 
complexes is obtained by adding maltose or by protease 
cleavage at specific sites inserted between the fused 
domains. Other domains have been fused with the CP 
domain. The GST domain binds to glutathione resin, the 
zz domain of protein A (zzProtA) binds to IgG coated beads 
and the fusion with the Streptavidin-Binding Peptide (SBP) 
allows binding to streptavidin coated resins whereas the 
GFP domain adds fluorescent properties. In all that cases, 
co-expression or co-purification of the fusion proteins is 
needed 
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addition of a tRNA scaffold to the Streptavidin 
aptamer increased binding efficiency of about ten- 
fold. Similarly, by optimizing the RNA aptamer S1 
in structure and repeat conformation, the affinity 
for Streptavidin was increased and found to be 
optimal with a fourfold repeat of Sim (4 x SIm) 
and be more efficient than the established MS2 and 
PP7 systems from bacteriophages [20]. 

The advantages of the Streptavidin tag include 
an elution step with biotin, which is done under 
mild conditions. Mild conditions preserve the 
particle integrity and increase purity since 
non-specific contaminants that bind to the column 
support would not elute (on the contrary to strin- 
gent elution with high salt, pH changes, or with 
denaturing agents). In addition, Streptavidin- 
matrices are widely distributed as agarose beads 
or magnetic particles. This eliminates the neces- 
sity of preparing a specific affinity resin. 

Sephadex-binding RNA aptamers were selected 
against Sephadex, a commonly used matrix in gel 
filtration (Fig. 2.2a) [7]. Sephadex is widely used in 
many laboratories and readily available. It is rela- 
tively inexpensive, making large-scale purification 
more affordable. Sephadex is stable and can be 
regenerated several times. Its basic structure, 
formed of glucose repeats connected via a-1,6 
glucosidic bonds, provides a high binding capacity. 
Using Sephadex G-100, the Sephadex aptamer 
could be purified from a complex mixture of cellu- 
lar RNA, with an enrichment of at least 60,000- 
fold. Similarly, yeast nuclear RNase P containing 
one RNA molecule tightly associated with 9 protein 
subunits could be purified and eluted by competi- 
tion with soluble dextran B512 [21]. 


2.3.3 Affinity Purification with RNA 
Tags Derived from Natural 


Sequences 


A variety of studies have focused on natural RNA 
tags that are small and form structured RNA 
motifs that can fold independently of the tagged 
RNA (Fig. 2.2b). They bind specific ligands with 
high affinity. They include the MS2 RNA hairpin 
[10-13, 22, 23], the Pseudomonas phage 7 (PP7) 
RNA hairpin for coat protein [24—28], the 19-nts 
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of the B-box of lambda bacteriophage that bind 
the antiterminator protein N (22 amino acid 
RNA-binding domain) [29-31], or the U1 
snRNA hairpin II that binds U1 A protein [32]. 

The most widely used affinity purification with 
RNA tag is that of the MS2 coat protein (CP) and 
its cognate RNA. MS2 is a member of a family 
of closely related enterobacterial viruses that 
includes bacteriophage f2, bacteriophage Qf, 
and R17. The coat proteins of these single- 
stranded RNA bacteriophages are translational 
repressors of their viral replicase. They achieve 
this function by specifically binding an RNA 
hairpin that encompasses the replicase start 
codon. This interaction is generally conserved 
among the RNA bacteriophages with variations 
in the sequences of the coat proteins and RNA 
hairpins during evolution. 

Initially developed by Bardwell & Wickens in 
1990 (see above) with the R17 recognition 
sequence originating from the phage R17, the 
MS2 tag of bacteriophage MS2 has similar 
sequence and binds the coat protein (CP) with 
similar high affinity. Three copies of the MS2 
tag are usually placed in the RNA of interest in 
order to improve its binding efficiency. Since the 
affinity of the repeated MS2 sequence for MS2 
CP is too high to consider an efficient elution by 
direct affinity purification, a hybrid protein is 
used as an intermediate to affinity-bind the MS2 
sequences on one side (CP moiety) in fusion with 
the Maltose-Binding Peptide (MBP moiety) in 
order to interact with the resin on the other side. 
Therefore, there is no requirement for the MS2 
CP to be immobilized, and the isolation can be 
performed using the MBP and not the MS2 coat 
protein. The most used fusion protein involves 
the N-terminal domain of the MBP combined 
with the MS2 CP part with a double mutation 
(V75Q and A81G) that prevents the oligomeriza- 
tion [33]. The MS2 CP-MBP fusion protein is 
expressed in Escherichia coli and after affinity 
purification the complex can be released from 
the affinity resin by elution with maltose 
(Fig. 2.2b, c). Alternatively, the elution can be 
performed using a protease cleavage site located 
between the MS2 coat protein site and the protein 
that binds directly to the affinity resin. In this 
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case, purified complexes are released from the 
resin on by cleavage with a specific protease. 
Regardless of whether the elution proceeds via 
the maltose or by protease cleavage, the MS2 
CP-MBP or MS2 CP remain bound to the 
mRNA-protein complex, which may represent a 
disadvantage for structural or functional studies. 
This approach has been successfully used to 
purify many large molecular weight complexes 
such as U2 snRNP with the ATP-independent 
spliceosomal complex [10], ribosomes on IRES 
elements [12] or native spliceosomes [11, 13, 22, 
34]. 

Variations of the fusion MS2 CP-MBP include 
the MS2 CP-GST which binds to Glutathione- 
beads with an elution step by glutathione 
[35]. An immunoglobulin variant was proposed 
with the MS2 CP domain fused to the 
immunoglobulin-binding domain of the Staphylo- 
coccus aureus protein A (ZZ domain) (Fig. 2.2c). 
The complex of interest could be purified from a 
cell lysate using immunoglobulin-conjugated 
beads [36]. The MS2-BioTRAP method is another 
RNA tagging system designed for purification of 
in vivo assembled RNPs. In this method, the RNA 
of interest harbouring multiple MS2 RNA 
elements is co-expressed with a MS2 CP fused to 
the HisBio tag (HB tag). The HB tag adds to the 
MS2 CP two hexa-His tags, a TEV cleavage site, 
and a signal sequence for in vivo biotinylation. 
Here, the high affinity of biotin for streptavidin is 
used to isolate the endogenously biotinylated 
HB-tagged protein associated with the 
MS2-tagged RNA within the RNPs [37]. In addi- 
tion, the MS2 aptamer inspired numerous derived 
methods for targeting RNPs or miRNAs such as 
RaPID [38], MS2-TRAP [35], and Ribotrap 
[36]. The MS2-CP was also fused with Green 
Fluorescent Protein (GFP) to visualize mRNAs 
bearing the MS2 aptamer in vivo [39-41]. MS2- 
CP-GFP could also be fused with the Streptavidin- 
Binding Protein (SBP) in order to give a MS2-CP- 
GFP-SBP, which could be visualized by fluores- 
cence and purified using streptavidin-conjugated 
beads [38], however these methods go far beyond 
the scope of this article. 

Generally, all these methods are relatively 
flexible and widely applicable, and in some 
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cases, their in vivo applicability allows the study 
of protein-RNA interactions in physiological 
conditions. Tagged RNAs can be stably 
expressed in appropriate mammalian cell lines 
and RNPs assembled on the tagged RNAs may 
be purified from extracts using the appropriate 
affinity purification method. The primary use of 
such techniques has involved identification of 
mRNP components by mass spectrometry, but 
purified RNPs may also be used for downstream 
functional assays. However, in some cases, the 
incorporation of a tag into the RNA bait may alter 
its secondary structures and possibly the forma- 
tion of ribonucleoprotein complexes. On the other 
hand, the presence of the exogenous tag may also 
induce binding of additional nonspecific RNA 
binding proteins. 


2.4 RNP Capture Based 
on Antisense Biotinylated 
Oligonucleotides 

These methods are using antisense 


oligoribonucleotides that are complementary to 
the RNA of interest (mRNA, or RNP). These 
complexes are subsequently enriched and purified 
from cell extracts by hybridization. Usually, one 
or several biotin residues are included at the 5’ 
end or internally of the oligonucleotide to immo- 
bilize it to a Streptavidin column. Following 
hybridization, binding and extensive washing 
steps, the RNA of interest is eluted by a displace- 
ment oligonucleotide, or by targeted RNase deg- 
radation (Fig. 2.3a). In these methods, the RNA 
of interest does not need a special preparation; the 
RNA oligonucleotide is chosen to be specific of 
the RNA sequence. Usually, 2’-O-alkyl (methyl 
or allyl) oligoribonucleotides are preferred since 
they are resistant to nuclease degradation in crude 
extracts containing high levels of endogenous 
RNase or DNase activity [42]. Elution of bound 
RNP from the antisense oligonucleotide can be 
achieved under non-denaturing conditions, using 
a displacement oligonucleotide, which can form a 
thermodynamically more stable duplex with the 
antisense oligonucleotide [43—45]. Alternatively, 
a DNA oligonucleotide complementary to 
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oligonucleotides. (a) Affinity purification by oligonucleo- 
tide antisense and elution with RNase H (Lower). The 
RNP of interest is assembled to the RNA immobilized 
via an antisense biotinylated RNA oligonucleotide (grey 
color) to streptavidin beads. The antisense oligonucleotide 
contains 2’-0-alkyl sugars in order to resist to degradation 
in cell extracts. After washing, the elution of the RNP is 
obtained by adding of a DNA oligonucleotide, which 
forms the cleavage site for RNase H. (b) This method is 
based on the formation of RNA-DNA chimeric molecules 
that resist to the RNase H activity present in cell extracts. 
T4 DNA ligase catalyses the ligation of RNA-DNA 
hybrids if these are joined by base pairing to a “splint 
oligonucleotide”. To circumvent the problem of imperfect 
pairing due to heterogeneous 3’ ends due to the tendency 
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of RNA polymerases to incorporate in vitro non-encoded 
nucleotides, a degenerated splint carrying a mixture of all 
four nts is used. After removal of the splint oligo-mixture, 
RNPs are assembled in rabbit reticulocytes lysate (RRL). 
Release of the complexes is performed with DNase I or 
RQ1 DNase, which specifically cleave the DNA part of the 
chimera. (c) In this approach, histone H4 mRNA is ligated 
to a biotinylated DNA oligonucleotide (in grey) in two 
successive steps using M. thermoautotrophicum RNA 
ligase (MthRnl) and truncated T4 RNA ligase, forming a 
chimeric RNA/DNA molecule. Assembly of translation 
initiation complexes on the mRNA is performed in RRL. 
Elution is performed with RNase H as in panel A. (d) 
Representative denaturing urea PAGE gel showing the 
purification of 80S particles assembled on H4 mRNA by 
the method described in panel C. The chimeric mRNA- 
biotinylated DNA (lane 1) was loaded on the beads 
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another part of the RNA of interest is added 
together with RNase H which will specifically 
cleave the DNA/RNA duplex and induce the elu- 
tion of the RNP in native conditions [46, 47]. 

To successfully apply this technique, an 
unstructured and accessible region of RNA 
sequence must be available to hybridize the anti- 
sense oligoribonucleotide. Therefore, this method 
can be difficult for highly structured RNAs and 
RNPs. The need for duplex formation of a previ- 
ously single-stranded region of the RNA may 
tend to destabilize the RNP architecture. On the 
other hand, compared to methods that incorporate 
a structured RNA tag, this method avoids folding 
problems of the RNA due to the foreign sequence. 
In addition, several different hybridization sites in 
the RNA target can be quickly tested for accessi- 
bility to synthetic oligonucleotide probes. 


2.5 RNP Purification Based 
on Ligation of a 


DNA-Oligonucleotide Tag 


This method brings together several advantages 
from all the other methods. The RNA sequence of 
interest is ligated to an unstructured DNA oligo- 
nucleotide also called adapter, forming a chimeric 
RNA-DNA molecule. In this way, standard 
chemically synthesized DNA oligonucleotides 
carrying a variety of modifications, including bio- 
tin, can be ligated to RNA. The resulting chimera 
can efficiently bind streptavidin resins or beads 
where the RNP complexes are subsequently 
assembled starting from crude cell preparations. 
Here the absence of a foreign structured tag 
reduces constraints on RNP formation and inter- 
action with non-specific molecules. 


< 
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2.5.1 Tagging RNA by 
Splint-Dependent Ligation 
with T4 DNA Ligase 
In this version of the method, a standard 


biotinylated-DNA adapter is ligated by T4 DNA 
ligase, which is able to ligate hybrids of ribo- and 
deoxyribonucleotide homo-polymers in double 
stranded regions [48-51]. Therefore, a splint 
DNA oligonucleotide hybridizing both mRNA 
and biotinylated DNA oligonucleotide is added 
to form the double-strand and allow ligation 
(Fig. 2.3b). Efficient ligation of RNA/DNA 
duplexes requires stoichiometric or greater 
concentrations of T4 DNA ligase because this 
enzyme cannot turnover effectively on 
RNA-containing duplexes [50]. In addition, since 
T7 RNA polymerase produces significant amount 
of N + 1 RNA products by run-off transcription, a 
second splint oligonucleotide is added with an 
extra nucleotide (mix of four nucleotides at this 
position) at the junction site (Fig. 2.3c). Prior to 
incubation with crude cell extracts, the splint oli- 
gonucleotide is removed from the duplex to pre- 
vent cleavage by endogenous RNase H (which 
cleaves RNA/DNA duplexes). To do so, the 
immobilized chimeric mRNAs is incubated 
2 min at 95 °C to unwind mRNA-splint duplexes 
and then 10 min at room temperature in the pres- 
ence of a tenfold excess of an antisense splint 
DNA oligonucleotide containing the complemen- 
tary sequence to trap the splint and avoid 
re-annealing on the mRNA. After these washing 
steps, the immobilized mRNA—DNA chimera is 
ready for RNP complex capture. It is possible to 
accommodate 3’-sequence variety of the RNAs of 
interest by using a combination of splints, whereas 
the biotinylated oligonucleotide remains constant. 


Fig. 2.3 (continued) (7 pmoles are shown, 50% of the 
mRNA molecules were ligated to the biotinylated DNA 
oligonucleotide). Unbound mRNAs are shown in the flow- 
through (lane 2) and washing fractions (lanes 3—4). 
Translation-initiation complexes were assembled in RRL. 
Unbound rRNAs and tRNAs are shown in the flow- 


through (lane 5) and washing fractions (lanes 6—8). The 
bound 80S particles were eluted with RNase H (lane 9) 
after adding the DNA oligonucleotide forming the RNase 
H cleavage site. The positions of the ribosomal RNAs, 
tRNAs and histone H4 mRNA are indicated 
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The elution of RNP is performed by DNase I, 
which degrades the DNA moiety of the chimera 
(the adapter), releasing in native conditions the 
RNP assembled on an intact RNA without 
extra tag. 

It is also possible to extend the 3’-end of the 
RNAs by a poly (CAA), unstructured extension 
that favors hybridizing the splint. For the assem- 
bly of translation initiation complexes, the chime- 
ric molecule is obtained by ligating the DNA 
oligonucleotide to the 3’ end of the mRNA to 
enable ribosome binding to occur freely on the 
5’ end. However, the method can be adapted to 
isolate RNP complexes assembled on the 3’ end 
of the mRNA, since nothing prevents ligation of 
the biotinylated DNA oligonucleotide to the 5’ 
end of the mRNA. 

The method has been used to assemble trans- 
lation initiation complexes on histone H4 mRNA, 
hepatitis C virus (HCV) internal ribosome entry 
site (IRES), B-globin mRNA, and cricket paraly- 
sis virus IRES, and these complexes were suitable 
for cryo-EM and mass spectrometry studies 
[52-55]. 


2.5.2 Direct Ligation 
of 5'-Adenylated 
Deoxyoligonucleotides Tag by 


T4 RNA Ligase 2 


The advent of deep sequencing has led to the 
development of new methods to generate 
very large libraries of DNA molecules. To 
improve ligation efficiency and limit concatemer 
formation, the ligation reaction is split in 
two consecutive steps. ATP is first used to form 
a 5'-adenylated DNA oligonucleotide (5'-- 
AppDNA) that is the biochemical intermediate 
of enzymes as DNA- or RNA-ligase. This 
activated intermediate is a good substrate for var- 
ious enzymatic reactions, forming a building 
block to assemble nucleic acids under specific 
conditions. Then, the 5'-AppDNA molecule acts 
as substrate during the second step catalyzed by 
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robust T4 ligases that work in the absence of ATP 
and do not concatenate products or form circles. 

The first step of 5’-adenylation of the oligonucle- 
otide is catalysed by a thermostable RNA ligase of 
Methanobacterium thermoautotrophicum (MthRnl) 
[56, 57]. The enzyme usually catalyzes the intramo- 
lecular ligation of single-strand RNA through 
ligase-adenylate and 5’-AppRNA intermediates. 
When using 3’-terminal blocked DNA 
oligonucleotides, and suitable ATP concentration, 
MthRnl accumulates 5’-AppDNA that can serve for 
efficient ligation of RNA or DNA molecules. 

The second enzyme that ligates the RNA of 
interest to the activated 5’-AppDNA adapter is a 
mutated form of truncated T4 RNA ligase 
2 (T4 Rnl2tr). T4 Rnl2tr is defective in self- 
adenylation and readily accepts pre-adenylated 
substrates for ligation [58]. Adding the R55K and 
K227Q mutations to T4 Rnl2tr leads to very low 
background in concatenation and circularization of 
the RNA molecules [58]. Because the optimized 
enzyme does not use ATP for ligation but 
pre-adenylated adapters, up to 75% of efficiency 
is reached during ligation [58]. As stated above, 
this procedure is commonly used to optimize 
adapter ligation for high-throughput sequencing 
technology. Here, we recently adapted the protocol 
for the formation of the mRNA-DNA chimera for 
RNP purification, and more precisely for purifica- 
tion of translation initiation complexes assembled 
on mRNA. With this method, we are routinely 
reaching 50% efficiency when ligating a 
biotinylated DNA adapter to the 3’ OH end of 
the mRNA (Fig. 2.3c), which is much higher 
than the result obtained by the splint-dependent 
ligation with T4 DNA ligase (about 10%). The 
following steps of RNP assembly and washes 
basically follow the conventional protocols. At 
the end, elution is performed with RNase H after 
adding a complementary DNA oligonucleotide 
that forms the RNase H cleavage site in the 3’ 
part of the mRNA. We found that RNase H was 
less sensitive than DNase I to steric clashes with 
the matrix beads and larger amounts of RNP are 
recovered when compared with the DNase elution. 
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2.6 Outlook 

In the past few years, many approaches targeting 
the assembly of RNPs from cell lysates have been 
described. The capture of RNPs assembled 
in vitro remains valuable for the structural or 
biochemical analysis of a large variety of 
complexes and systems, including the 
spliceosome, ribosome, and other smaller RNPs 
such as the telomerase and RNase P. A critical 
step of these purification processes remains the 
capture of the RNA sequence of interest to a solid 
support that will allow separation of the total cell 
extract. This step relies on the presence, in the 
RNA target, of an external element with affinity 
for the solid matrix. Various types or elements 
have been selected and tested. Synthetic RNA 
aptamers have been artificially selected to target 
a specific molecule. Once bound to a solid sup- 
port this molecule can be used for affinity purifi- 
cation of the RNP. Other elements are natural 
RNA sequences exhibiting specific binding sites 
for well-characterized RNA-binding proteins 
(phage elements). These methods are attractive 
because RNA aptamers are generally small, 
directly bind bead matrixes, and are eluted with 
small molecules. However, to successfully tag a 
given RNA, the aptamer should be placed in a 
solvent-accessible region. It is important to con- 
sider the steric effect of using such aptamers 
and the potential interference with the folding, 
structure, assembly and normal function or 
interactions of the RNP under study. Conversely, 
the RNP assembly may disrupt aptamer folding 
and therefore its ability to bind the solid support. 
Therefore after designing the hybrid RNA, it may 
be useful to predict the folding of the sequence. In 
case attempts to purify RNP complexes all fail, it 
might be necessary to reiterate the design of the 
construct by trying different tags, changing their 
location in the RNA, or adding flexible spacers 
that would prevent steric hindrance. 

The use of biotinylated oligonucleotides, 
which are complementary to accessible single- 
stranded regions or ligated to the RNA of interest, 
is an efficient alternative approach when the RNA 
structure is too constrained and incompatible with 
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the aptamer approach. Compared to aptamers, 
biotin is a discrete molecule that only slightly 
modifies the RNP complex. The biotinylated 
RNA, in the form of a DNA-RNA hybrid or 
ligated chimera, exhibits high affinity to 
streptavidin affinity resins. Efficient assembly 
and stringent washing steps allow the decrease 
of the background of contaminating proteins. 

In addition, elution of the RNP complexes can 
be achieved under native conditions using com- 
peting oligonucleotides or DNase/RNase which 
releases the RNP without extra RNA sequence. 

In conclusion, we have presented here several 
general strategies for selecting RNPs in vitro 
assembled in crude cell extracts and exhibiting 
in vivo potential application. Although these 
methods have been developed for the preparation 
of specific complexes, most of them have been 
successfully applied to the preparation of RNP 
involved in fundamental cellular processes. 
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Abstract Protein digestion - ECD fragmentation - ESI - 
ETD fragmentation - Glu-C - HCD 


In the last two decades, biological mass fragmentation - LC-MS/MS - MALDI-TOF/ 
spectrometry has become the gold standard MS - Mascot - MS - Orbitrap - Pepsin - 
for the identification of proteins in biological 


samples. The technological advancement of 
mass spectrometers and the development of 
methods for ionization, gas phase transfer, 3.1 Introduction 
peptide fragmentation as well as for acquisi- 
tion of high-resolution mass spectrometric 
data marked the success of the technique. 
This chapter introduces peptide-based mass 
spectrometry as a tool for the investigation of 
protein complexes. It provides an overview of 
the main steps for sample preparation starting 
from protein fractionation, reduction, alkyl- 
ation and focus on the final step of protein 
digestion. The basic concepts of biological 


Proteomics - Reduction - Sequest - Trypsin 


Protein identification by mass spectrometry 
(MS) relies on the determination of its amino 
acid sequence. Two different approaches, namely 
top-down and bottom-up, are adopted for this 
purpose. In a top-down method the identification 
is achieved by MS fragmentation of the whole 
protein. On the other hand, the bottom-up 
(or peptide-based) analyses start with the protein 
; digestion and peptide sequencing followed by 
mass spectrometry as well as details about protein identification by searching and matching 


instrumental analysis and data acquisition are the peptide into databases of protein sequences 
described. Finally, the most common methods [1, 2] 


for data analysis and sequence determination 
are summarized with an emphasis on its appli- 
cation to protein-protein complexes. 


The overall process of peptide-based MS can 
be subdivided into three main steps: sample prep- 
aration (Sect. 3.2), data acquisition (Sect. 3.3) and 


data analysis (Sect. 3.4). 
Keywords 


Protein alkylation - Asp-N - Bottom-up - CID 
fragmentation - Data dependent acquisition: 3-2 Sample Preparation 


G. Degliesposti (2) The sample preparation step consists of the 
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fragmentation pattern can be determined by mass 
spectrometry. A general workflow proceeds 
through sample denaturation, fractionation, 
reduction, alkylation and terminates with proteo- 
lytic digestion [3]. 


3.2.1 Sample Fractionation 

The digestion of protein complexes composed of 
many subunits generates significant numbers of 
different peptides. Sample fractionation aims to 
improve the MS identification of all the proteins 
in a complex by reducing sample complexity. 

Two commonly used approaches to protein 
fractionation are gel electrophoresis and size 
exclusion chromatography. Mono-dimensional 
and bi-dimensional gel electrophoresis separate 
proteins in gel bands and spots that after excision 
are further processed in-gel. Conversely, protein 
separation by size exclusion chromatography 
(SEC) allows the fractionation and collection of 
proteins in-solution with simultaneous buffer 
exchange in case optimized buffers for activity 
and selectivity of proteolytic enzymes are 
required. 

Despite the advantages of reducing the sample 
complexity and facilitation of buffer exchange, 
protein fractionation implies sample losses that 
can lead to biased results in cases of low protein 
abundance and where quantitative or semiquanti- 
tative analysis is required. 


3.2.2 Reduction and Alkylation 


of Cysteine 


Cysteine-containing proteins often display disul- 
fide bonds in their tertiary and quaternary 
structures. Digestion of proteins containing disul- 
fide bonds generates non-linear cross-linked 
peptides which are more difficult to identify by 
mass spectrometry. By introducing a preliminary 
reduction of cysteines, free linear peptides are 
generated, and their identification simplified. 
The reduction reaction is usually based on DTT 
(dithiothreitol), TCEP  (tris-(2-carboxyethyl)- 
phosphine) or 2-ME (2-mercaptoethanol) with 
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comparable performance. Reformation of 
disulfide bonds is prevented by blocking the 
sulfhydryl groups with iodoacetamide to generate 
carbamidomethyl derivatives. Minor side 
reactions of iodoacetamide are observed to differ- 
ent extents in the following order of frequency: 
alkylation of N-terminus > glutamic acid > the 
C-terminus and lysine > aspartic acid > tyrosine 
> histidine [4]. 


3.2.3 Protein Digestion 

After reduction and alkylation, proteins are usu- 
ally digested using a proteolytic enzyme. The 
choice of protease is extremely important to 
maximize the number of informative and MS 
detectable peptides. In certain conditions the 
combination of more than one proteolytic enzyme 
can increase sequence coverage and identify a 
higher number of proteins. A quick review of 
the most commonly used proteases is found 
below. 


Trypsin Trypsin is a serine endoproteinase that 
cleaves peptide bonds of lysine and arginine 
residues at the C-terminal side except for the 
imido bonds formed with proline. The distribu- 
tion and abundance of lysine and arginine in 
proteins enables the generation of peptides with 
a suitable size for MS identification providing 
good sequence coverage. This feature makes tryp- 
sin the most frequently used enzyme in proteo- 
mics. Hydrolysis at the C terminus of positively 
charged residues produces peptides with a neutral 
balance of charge. This improves their ionization 
in ESI (electrospray ionization) sources and leads 
to a better fragmentation in tandem mass 
spectrometry leading to more accurate peptide 
identification. 

Trypsin activity is optimal in the pH range of 
7-9. Mild denaturing conditions such as 1 M 
urea, 10% acetonitrile and 0.1% SDS are tolerated 
[5-7]. 


Lys-C Lys-C is a serine endoproteinase that 
selectively hydrolyzes the C-terminal amide and 
imido bond of lysine residues. Lys-C is stable in 
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strong denaturing conditions (e.g., 8 M Urea, 1% 
SDS and 40% acetonitrile) improving the hydro- 
lysis of structured regions not accessible under 
non-denaturing conditions. Lys-C specificity for a 
single amino acid reduces the number of cleavage 
sites and generates longer peptides than trypsin. 
By tandem digestion with Lys-C in strong 
denaturing conditions followed by trypsin diges- 
tion in milder conditions, the number of long 
peptides is reduced, and their MS identification 
increased [5, 6, 8]. 


Glu-C Glu-C is a serine endoproteinase that 
selectively cleaves the C-terminal of glutamic 
acid residues. The specificity is maximized in 
ammonium bicarbonate and ammonium acetate 
buffers. In phosphate buffer selectivity is reduced 
and hydrolysis occurs at both glutamic and 
aspartic acid residues. The proline imido-bond 
generates a missed cleavage site. The catalytic 
activity of Glu-C is optimal at pH 8 and it is 
retained in mild denaturing conditions (e.g., 2 M 
urea, | M guanidine chloride, 0.1% SDS and 20% 
acetonitrile) [9—12]. 

Due to the high selectivity and the missed 
cleavages induced by proline at the C-terminal 
side, Glu-C digestion potentially generates long 
peptides with weak ionizability. However, if 
combined with Lys-C and trypsin, Glu-C allows 
the generation of shorter peptides with improved 
ionization and fragmentation properties. 


Asp-N Asp-N is a zinc metalloproteinase that 
primarily hydrolyzes peptide bonds at the 
N-terminal side of aspartic acid and cysteine 
residues if oxidized to cysteic acid. A secondary 
cleavage activity towards glutamic acid residues is 
known but its hydrolysis rate is lower than that 
at aspartic acid residues. Asp-N activity is 
maximized at 37 °C in the pH range of 6-8. Mild 
denaturing conditions such as | M urea, 2 M gua- 
nidine hydrochloride, 0.1% SDS, 2% CHAPS and 
10% acetonitrile are tolerated [13—15]. 


Pepsin Pepsin is a relatively nonspecific endo- 
peptidase which cleaves predominantly at the C 
terminus of aromatic residues such as 


phenylalanine, tyrosine, and tryptophan. Activity 
at other residues is known but with variable effi- 
ciency. The optimal pH is between 1 and 3 and it 
is permanently inactivated above pH 6. Due to the 
low selectivity of pepsin the number of short 
peptides generated is high and for this reason it 
is used for specific purposes (e.g., Hydrogen Deu- 
terium Exchange experiments). Pepsin can be 
used free in solution or immobilized on resin 
supports. The immobilization increases the stabil- 
ity of the enzyme to autolysis preventing the 
contamination of the sample with pepsin peptides 
[16-19]. 


3.2.4 Multiple Enzymatic Digestion 
Digestion with more than one enzyme allows the 
generation of a larger number of shorter peptides. 
This potentially increases the coverage and the 
number of the identified proteins. Examples of 
combined digestion with Lys-C in strong 
denaturing conditions followed by mild trypsin 
digestion have been reported as effective methods 
to improve protein identification in proteomics 
experiments [20-22]. 


3.2.5 Chemical Hydrolysis 
In certain experimental conditions chemical 
hydrolysis can be a valid alternative or a comple- 
mentary method to enzymatic digestion. 
Examples of chemical and targeted residues are 
the cyanogen bromide (CNBr) that cleaves at 
non-oxidized methionine (Met) residues, formic 
acid selective for aspartic acid and proline 
dipeptides (Asp-Pro) and hydroxylamine cleav- 
ing asparagine and glycine bonds (Asn-Gly) [23]. 
The combination of enzymatic and chemical 
hydrolysis can increase the number of detectable 
peptides. Several protocols based on enzymatic 
(trypsin) and chemical digestion (CNBr) display 
an increased sequence coverage and number of 
identified membrane proteins. This is attributable 
to a regular and convenient distribution of methi- 
onine residues in transmembrane domains [24]. 
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The hydrolytic activity of CNBr requires 
concentrated acid (e.g., acetic or formic acid) 
and high temperature of reaction. These 
conditions promote extensive acetylation or 
formylation of the sample and should be consid- 
ered for downstream analysis. 


3.3 Sample Analysis 

After sample preparation, peptide mixtures are 
ready for identification. Nowadays, the most 
commonly employed methods for protein and 
peptide analysis are based on liquid chromatogra- 
phy and tandem mass spectrometry (LC-MS/MS) 
and on the indirect analysis of the whole or 
pre-fractionated samples in solid phase by matrix 
assisted laser desorption ionization-time of flight 
mass spectrometry (MALDI-TOF-MS). 


3.3.1 LC-MS/MS 

The LC-MS/MS methods are commonly based on 
nano-LC systems coupled to mass spectrometers. 
These instruments allow flow rates of hundreds of 
nanoliters per minute and are directly interfaced 
with the mass spectrometer. The chromatographic 
separation is usually performed on reverse phase 
columns with small internal diameters (e.g., 
75 um) which improves the chromatographic res- 
olution enhancing the sensitivity. Elution is car- 
ried out using increasing gradients of acetonitrile 
in water at constant concentration of formic acid. 
Formic acid protonates the basic groups of amino 
acid side chains and N-terminal amine groups 
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thus neutralizing negative charge of the carbox- 
ylic groups. The elution period is proportional to 
both sample complexity and mass spectrometry 
acquisition speed. The amount of sample required 
for identification depends on the complexity of 
the sample, the instrumental set-up and the exper- 
imental needs [25]. 


3.3.2 MALDI-TOF-MS 

MALDI-TOF-MS is an off-line method based 
on the co-crystallization of analytes with a 
light-absorbing matrix (e.g., sinapinic acid 
and a-cyano-4-hydroxycinnamic acid) which, 
through laser excitation, induce the analyte ioni- 
zation. The MALDI ion-source is usually coupled 
to time-of-flight (TOF) analyzers for the determi- 
nation of intact mass of proteins and peptides. 
The simple and rapid operativity of MALDI- 
TOF-MS found a wide distribution in proteomics 
laboratories for the analysis of both proteins and 
peptide. However, the lack of a liquid chro- 
matographic separation step requires the 
prefractionation of samples before the deposition 
on the MALDI target [26, 27]. 


3.3.3 Mass Spectrometers and Data 


Acquisition 


For proteomic investigations aimed at the identi- 
fication of unknown proteins, the most commonly 
used mass spectrometers are orbitrap hybrid 
instruments such as LTQ-Orbitraps or Quadru- 
pole-Orbitraps. Targeted analyses on these 


Fig. 3.1 (continued) of the peptide ions is acquired in the 
Orbitrap analyzer and peaks ranked by intensity. The nth 
most intense peaks (e.g., the first three in the picture) are 
further analyzed in the following steps. In sequential 
events each ion is selected in the quadrupole (MS1) and 
fragmented in the collision cell. A full scan of the 
generated fragments is acquired in the Orbitrap (MS2), 
and MS/MS spectra collected. (b) Selected reaction moni- 
tor (SRM) on a triple-quadrupole instrument. After 
electrospray ionization, a list of parent ions is cyclically 
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scanned. Each peptide ion is filtered by the first quadru- 
pole (MS1) and fragmented. Specific fragments of the 
parent ions are then sequentially monitored by the third 
quadrupole (MS2) and their intensities recorded (MS/MS 
spectra). The extract ion chromatogram of each transition 
is used for quantitative purposes. (c) Summary of the 
theoretical positive ions generated by fragmentation of a 
positively charged tetrapeptide and their commonly 
accepted nomenclature [42, 43] 
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Fig. 3.1 Different methods for data acquisition are avail- | Quadrupole-Orbitrap instrument. Peptides eluting from 
able using modern mass-spectrometers. (a) Summary of liquid chromatography are ionized by electrospray and 
data dependent acquisition (DDA) experiment on a hybrid enter the mass-spectrometer. A high-resolution full scan 
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instruments enables the acquisition of high reso- 
lution and accurate mass data over a wide mass 
range of unknown peptides and fragmentation 
products. Triple-Quadrupoles (QqQ) and hybrid 
Quadrupole-Time-Of-Flight (qTOF) analyzers 
are commonly used for identification and quanti- 
tation of known proteins. 

Methods for mass spectrometric data acquisi- 
tion are designed and tailored to improve the 
identification of unknown proteins or the quanti- 
tative determination of known proteins. 

Protein identification methods are usually 
based on data dependent acquisition (DDA) 
experiments (Fig. 3.la). The eluted peptides 
reach the ion source where ionization and evapo- 
ration occur. Ions are accelerated by an electrical 
field and collected into the mass spectrometer. A 
full MS scan is acquired, and the first n most 
intense peaks detected are shortlisted for 
MS/MS analysis. Each MS/MS experiment starts 
by filtering and accumulating parent ions by m/z 
ratios before fragmentation. A full MS scan of 
fragmented peptides is then acquired before the 
next MS/MS step starts. 

For quantitative investigations of known 
proteins, selected reaction monitoring (SRM) 
approaches are preferable over DDA experiments 
(Fig. 3.1b). SRM methods constantly monitor a 
defined set of parent ions and their transitions 
regardless of their intensity. This allows data col- 
lection of monitored ions at a constant frequency 
and reduces the bias generated by sample compo- 
sition and by contaminants. The setting up of an 
SRM method requires previously acquired infor- 
mation about parent ions and transitions for each 
monitored peptide. This data is usually acquired 
by an initial DDA analysis of the sample. 


3.3.4 Peptide lonization 

Mass spectrometers determine the m/z of analytes 
in the gas phase. Vacuum is a prerequisite for 
mass spectrometric analysis as well as for the 
ionization of the analytes. 

The MS analysis of peptides in solution or 
co-crystallized with MALDI matrix requires 
their transfer to the gas phase from liquid and 
solid phases, respectively. Macromolecules like 
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proteins and peptides are not volatile. Their 
evaporation and ionization require a good balance 
of energy to prevent degradation and loss of 
intrinsic information. The lack of methods and 
technologies to achieve this task has for many 
years been a bottleneck for the MS analysis of 
proteins and peptides. 

After a long development time, in the late 
1980s, John B. Fenn and Koichi Tanaka reported 
two methods for mass spectrometric analyses of 
biological macromolecules, namely Electrospray 
Ionization (ESI) and the Matrix Assisted Laser 
Desorption Ionization (MALDI). These two ioni- 
zation methods allowed the effective ionization 
of polypeptides preserving their sequence and 
introduced mass spectrometry to the field of pro- 
tein and peptide research. For their revolutionary 
discoveries Fenn and Tanaka were awarded the 
Nobel Prize in chemistry in 2002 [28-30]. 

ESI is based on the nebulization of the sample 
solution through a capillary tube against a high 
potential electric field and high temperature. The 
charged droplets generated gradually evaporate 
and the volume contraction increases the charge 
density on the drops. When the balance between 
repulsive and cohesive forces is lost in favor of 
repulsions, the droplets fall apart releasing 
charged analytes in the gas phase ready to enter 
the mass-spectrometer. ESI ionization is usually 
coupled to liquid chromatography [31, 32]. 

MALDI is based on the laser excitation of a 
solid matrix containing the sample dispersed 
within it. There is still much debate about the 
exact mechanism of ionization. During laser abla- 
tion, the matrix absorbs laser light converting it to 
heat in a small area, causing a plume of hot gas 
containing ions in an excited state to be ejected 
from the dried spot. In a secondary process these 
molecules ionize the analytes [33, 34]. 


3.3.5 Peptide Fragmentation 

In addition to peptide ionization, another impor- 
tant step in peptide-based mass spectrometry is 
the peptide fragmentation [35]. The peptide ions 
generated in the ion source of the mass spectrom- 
eter are usually referred to as parent ions. The 
peptide sequence is determined by fragmentation 
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of parent ions by collision with an inert gas as in 
Collision Induced Dissociation (CID) and in 
Higher energy Collision induced Dissociation 
(HCD) [36, 37] or by direct or indirect interaction 
with electrons as in Electron Capture Dissociation 
(ECD) and Electron Transfer Dissociation (ETD), 
respectively [38, 39]. 

Both CID and HCD methods are based on the 
collision between accelerated parent ions and 
inert gas molecules in a dedicated collision cell. 
The energy released by collision is absorbed by 
the parent ion increasing its internal energy and 
inducing the rupture of the most labile bonds. The 
amount of energy required for fragmentation 
depends on the size and charge of the parent ions. 

Experimental evidence supports the involve- 
ment of a proton at the cleavage site for the 
generation of fragment ions and are summarized 
in the mobile proton model [40, 41]. Protons 
responsible for cleavage are supposed to be 
bound to basic side-chains of parent ions amino 
acids (e.g., arginine and lysine) and to be 
mobilized by the collision energy towards the 
backbone heteroatoms where they catalyze the 
cleavage. The amount of collisional energy nec- 
essary to “mobilize” the proton is proportional to 
the basicity of the protonated side chains and its 
calibration is extremely important for the genera- 
tion of an appropriate number of meaningful 
fragments avoiding over-fragmentation that 
generates the less informative internal fragments. 

The length of parent ions is extremely impor- 
tant for the generation of informative MS/MS 
spectra. As a rule of thumb, multiple charged 
ions of peptides containing 6—20 amino acids 
generate MS/MS spectra suitable for sequence 
assignment. Spectra of shorter peptides do not 
provide enough fragments for a confident assign- 
ment of each amino acid and, conversely, the 
complexity of MS/MS spectra of longer peptides 
is too high for accurate data interpretation and 
assignment. 

Each of the above-mentioned methods 
displays preferential patterns of fragmentation. 
Roepstorff and Fohlman in 1984 and Johnson in 
1987 proposed a conventional nomenclature for 
fragments based on which bond is broken along 
the peptide backbone and which fragment retains 


the charge, as summarized in Fig. 3.lc 
[42, 43]. When the charge is on the N terminus, 
ions are defined as a, b or c. Otherwise, if the 
charge is retained on the C terminus, the ions are 
named x, y or z. Fragments a and x are generated 
by the rupture of the bond between the a-carbon 
and the carbonyl group, b and y when the peptide 
bond is broken, and c and z are generated by the 
cleavage of the bond between amide nitrogen and 
a-carbon. 

If more than one backbone cleavage happens 
on the same peptide an internal fragment of the 
parent ions is generated. These are usually formed 
by a combination of b and y cleavages that 
generates amino-acylium ions and less frequently 
of a and y providing amino-immonium ions. The 
shortest internal fragment generated by a and 
y cleavage of a single side-chain are 
immonium ions. 

CID fragmentation of positively charged ions 
usually generates b and y fragments and if the 
molecular ion contains R, N, Q or K residues in 
its sequence, loss of ammonia (—17 Da) is detect- 
able. In case the sequence includes S, T, E or D 
residues, loss of water (—18 Da) is detected 
[44]. HCD fragmentation generates a pool of 
fragment ions similar to those generated by CID 
with the exception of ammonia and water losses. 
ETD and ECD fragmentation methods generate 
MS/MS spectra mainly populated by c, y, z + 1, 
and z + 2 ions. 


3.4 Data Analysis 

The sequence of a peptide is determined by anal- 
ysis of the acquired MS and MS/MS spectra. Two 
main analytical approaches and a number of 
algorithms are available for this purpose. 

In one case, the analysis follows a de-novo 
sequence-independent interpretation of spectral 
data. Briefly, the de-novo sequencing algorithms 
interpret the MS/MS spectra and evaluate the 
mass difference between peaks predicting the 
whole sequence whose mass is expected to 
match the mass of the parent ion (i.e., MS m/z 
value). The applicability of these methods is lim- 
ited by the generation of non-predictable 
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fragments (e.g., excision of side-chains, neutral 
losses, and internal fragmentation) as well as 
uneven fragmentation along the polypeptide 
chain due to preferential cleavages (e.g., in pres- 
ence of proline residues). These additional peaks 
lead to an increase of MS/MS spectra complexity 
and ambiguity reducing the de-novo sequencing 
efficiency [45]. 

The second approach is based on an 
uninterpreted search of MS and MS/MS data 
from a previously generated peptide sequence 
database. This method is the most commonly 
applied to wide proteomics experiments. Each 
experimental MS/MS spectra is compared to a 
library of predicted spectra generated from 
sequences of in-silico digested proteins [46]. A 
number of licensed and free on-line and off-line 
search engines are nowadays available, e.g., 
Sequest [47], Mascot [48], X!Tandem 
[49, 50]. In addition to the m/z value of parent 
ions, the MS/MS data, and the database of 
sequences, additional parameters are required for 
searching, e.g., the MS and MS/MS match 
tolerances and the side-chain modifications to be 
accounted for. 

The databases of sequences used can be prote- 
ome wide (e.g., Uniprot or SwissProt databases) 
or a smaller subset of selected sequences tailored 
to the experimental purpose. The sequences 
included in the database are in-silico digested 
mimicking the activity of proteases used for the 
sample preparation (i.e., cleavage site specificity, 
missed cleavages, C-terminal or N-terminal 
cleavage). The generated peptides are then 
modified according to the list of fixed or 
variable modifications to be searched (e.g., 
Cys-carbamidomethy] and Met oxidation). Each 
generated peptide is indexed according to its 
molecular weight and the MS/MS fragmentation 
is predicted. 

The searching process is based on the cross- 
correlation of experimental parent ion m/z value 
and fragments peaks with the theoretical values 
included in the database. The matching of both 
experimental and theoretical MS and MS/MS 
values is allowed within a defined tolerance. 
Based on the number of matched peaks and devi- 
ation between theoretical and experimental values 
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the match is scored, and the assigned peptides 
ranked. 

Protein identification is based on the matching 
and assignment of identified peptides to the 
sequences collected in the protein database. An 
identified peptide can be assigned to a single 
protein (i.e., unique or exclusive peptides) or 
more proteins in case of sequence homology 
(i.e., shared or not-exclusive peptides). The 
assignment of exclusive peptides to a protein 
permits its identification and the assignment of 
not-exclusive ones contributes to increase its 
sequence coverage, thus increasing the confi- 
dence of identification. 
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Abstract 


The characterization of a protein complex by 
mass spectrometry can be conducted at differ- 
ent levels. Initial steps regard the qualitative 
composition of the complex and subunit identi- 
fication. After that, quantitative information 
such as stoichiometric ratios and copy numbers 
for each subunit in a complex or super-complex 
is acquired. Peptide-based LC-MS/MS offers a 
wide number of methods and protocols for 
the characterization of protein complexes. 
This chapter concentrates on the applications 
of peptide-based LC-MS/MS for the qualita- 
tive, quantitative, and structural characteriza- 
tion of protein complexes focusing on subunit 
identification, determination of stoichiometric 
ratio and number of subunits per complex as 
well as on cross-linking mass spectrometry and 
hydrogen/deuterium exchange as methods for 
the structural investigation of the biological 
assemblies. 
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4.1 introduction 

Peptide-based LC-MS/MS methods are effective 
tools for the acquisition of information about pro- 
tein composition and protein-protein interactions 
in multisubunit complexes through different 
experimental protocols. As summarized in 
Fig. 4.la, protein identification provides the quali- 
tative composition of the complex. Information 
about stoichiometric ratios among subunits and 
their copy number in the active complex requires 
relative and absolute quantitative investigation, 
respectively (Fig. 4.1b, c). In addition to qualita- 
tive and quantitative information, peptide-based 
MS investigations allow the collection of structural 
insights about protein complexes such as protein- 
protein interactions, structurally ordered regions, 
and solvent accessible surfaces. As examples of 
structural methods that rely on peptide-based mass 
spectrometry, chemical cross-linking (Fig. 4.1d) 
and hydrogen deuterium exchange will be 
discussed [1-8]. 


4.2 Subunit Identification 


and Characterization 


Subunit identification is the starting point for the 
characterization of a protein complex. After purifi- 
cation, protein complexes are usually checked for 
the retained biochemical activity. However, the 
first step for characterization is the identification 
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Fig. 4.1 Hypothetical 
assembly of a protein 
complex composed by 
multiple copies of different 
subunits. Through peptide- 
based mass spectrometry it 
is possible (a) to identify 
the proteins that form the 
complex, (b) their 
stoichiometric ratio by 
relative quantification and 
(c) the absolute number of 
copies of each protein by 
absolute quantification. 

(d) Structural analysis by 
chemical cross-linking and 
mass spectrometry allows 
the mapping of protein- 
protein interactions 
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of their exact protein composition. The available 
information about subunit and protein sequences 
depends on the way the sample is prepared (e.g., 
by extraction of native complexes from tissue or 
expression of recombinant proteins and in-vitro 
assembly of the complex). In case of extraction 
and purification from cells or tissues the exact 
composition of a protein complex is unknown 
and strictly depends on the original source. 
Non-homogeneous populations of purified 
complexes are generated by different isoforms of 
subunits, post-translational modifications, or labile 
interactions between subunits. In these cases, an 
accurate identification of each protein by sequenc- 
ing is extremely important. Where the complex is 
in-vitro assembled starting from purified recombi- 
nant proteins, information about protein sequences 
and post-translational modifications (PTM) of 
subunits is already available. However, the quanti- 
tative information, e.g., stoichiometry and number 
of copies, of the whole assembly is still unknown. 

Protein identification follows the peptide-based 
MS protocol described in Chap. 3, which is 
summarized in Fig. 4.2a. Denaturation of the com- 
plex and its subsequent fractionation are aimed at 
improving the identification of all the proteins and 
is particularly useful when complexes are com- 
posed of many subunits. Before protein digestion, 
the samples are incubated with reducing agents 
(e.g., DTT or TCEP) to disrupt disulfide bonds. 
The reverse-reaction is prevented by alkylation of 
sulfhydryl groups (e.g., with iodoacetamide) to 
irreversibly block the reactive thiols, thereby 
impeding the formation of new disulfide bonds. 
The reduction/alkylation step is omitted in cases 
where the determination of disulfide bonds is the 
goal of the characterization. Finally, after sample 
preparation, proteins are hydrolyzed, usually by a 
selective enzymatic digestion (e.g., with trypsin), 
to obtain peptides with suitable features for 
LC-MS/MS analysis. 

The MS and MS/MS data acquired allow pep- 
tide identification and assignment to protein 
sequences. These data also permit qualitative 
investigation of potential PTMs even though ded- 
icated protocols for sample preparation are neces- 
sary for a reliable detection of PTMs [9]. 


4.3 Quantitative Composition: 
Stoichiometry and Number 


of Subunits 


For the characterization of the quaternary struc- 
ture of a protein complex or in supramolecular 
assemblies, stoichiometric ratios, and number of 
copies of each subunit have to be determined. 

Native mass spectrometry is the gold standard 
method for the stoichiometry inference of purified 
and homogeneous complexes [10]. Native 
MS provides experimental masses of whole 
complexes and subcomplexes. The comparison 
and combination of experimental data with theo- 
retical masses allow the determination of the stoi- 
chiometric ratio for each subunit. Unfortunately, 
native-MS experiments require instrumentation 
and experimental conditions that limit its applica- 
tion only to specialized laboratories preventing its 
widespread use as a routine method. 

A more accessible alternative to native mass 
spectrometry is offered by protocols of quantita- 
tive peptide-based MS that can be applied in 
any proteomics laboratory. Using bottom-up 
approaches, the stoichiometry of a protein com- 
plex is identified by relative quantification com- 
paring the concentrations of each protein in the 
sample. On the other hand, absolute quantification 
is required for the identification of the number of 
copies of each subunit in the whole biological 
assembly. Two different methods for quantitative 
investigation of protein complexes are commonly 
used. One is based on classical quantitative MS 
methods (e.g., targeted mass spectrometric analy- 
sis using labeled peptides) while the second 
method is a label-free proteomics approach, 
based on data dependent MS acquisition [7]. 

A way to determine the stoichiometric ratios in 
protein complexes is by comparing the concentra- 
tion of each subunit. Mass spectrometry does 
not allow a direct comparison of protein 
concentrations. The peptide concentration is pro- 
portional to MS intensity which depends on the 
amino acid sequence that determines its ioniza- 
tion efficiency. For this reason, MS intensities of 
different peptides cannot be compared without 
a preliminary normalization against those of 
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standard peptides. Therefore, for quantitative 
purposes the definition and preparation of a 
quantified set of reference peptides is necessary. 
The ionization efficiency is not affected by isoto- 
pic composition, which makes isotope labeled 
peptides the best reference compounds for quan- 
titative purpose (Fig. 4.2b). 

The standard peptides can be produced in 
different ways and experimental conditions. 
Methods like SILAC (stable isotope labeling 
with amino acids in cell culture) or N labeling 
allow the expression of whole isotopically labeled 
proteins [11, 12]. In this way each peptide of the 
complex is produced as a labeled isoform defin- 
ing the best condition for quantitative investiga- 
tion. The inclusion of specifically labeled amino 
acids in the protein limits these methods only to 
in-vitro expressed complexes. For complexes 
extracted from tissues, or if the expression of 
whole labeled proteins is unachievable, a not 
comprehensive set of reference peptides for each 
protein must be defined, designed, and produced, 
e.g., by chemical synthesis or expression. 

Uniqueness and high ionization efficiency are 
key requirements for reference peptides. The 
selection of the best candidates as standards 
follows a preliminary qualitative MS screening. 
At least two unique peptides displaying high 
ionization efficiency for each protein must be 
identified (Fig. 4.2b). For each peptide three 
MS/MS transitions are recorded in a Selected 
Reaction Monitoring (SRM) data acquisition 
method [13-15]. 

The addition of standards in quantitative anal- 
ysis requires accuracy and precision. To optimize 
this step, several methods for standard prepara- 
tion and addition have been developed and 
applied to real experimental conditions [5, 16]. 

In 2003 Gerber and colleagues reported an 
absolute quantification protocol named AQUA 
[17]. This is based on spiking the sample with 
known amounts of isotopically labelled reference 
peptides (termed AQUA peptides). As mentioned 
above, isotopically labelled standards allow a 
direct comparison for both endogenous and stan- 
dard peptides. By normalizing the intensities of 
endogenous peptides to those of the standards it is 
possible to measure the relative concentration of 


each subunit of a complex. This approach has two 
main requirements for a reliable quantification 
based on AQUA peptides. First, the precise addi- 
tion of standards for each protein at a known 
equimolar concentration. Second, the complete 
digestion of all the proteins. These requirements 
in turn represent the limitations and pitfalls of this 
quantitative methods. 

QconCAT (Quantitative conCatamers) [18] is 
an improved approach for the production and 
addition of equimolar amounts of standards 
aimed at overcoming the mentioned limitations 
of AQUA peptides. This is based on the genera- 
tion of a concatemer of reference peptides 
separated by a specific proteolytic site (e.g., tryp- 
sin). The in-vitro or in-vivo expression of 
concatenated peptides allows the generation of 
an artificial isotopically labeled QconCAT poly- 
peptide that after quantitation is added to the 
undigested complex. The enzymatic digestion of 
the sample generates both endogenous (unla- 
beled) and artificial (labeled) peptides. If the 
digestion proceeds to completion, an equal con- 
centration of all standard peptides in the mixture 
is achieved. The accuracy of the analysis depends 
on the digestion efficiency of both sample and 
QconCAT peptide. The lower the number of 
missed cleavage sites in both the sample and 
QconCAT peptides the higher is the accuracy 
of the analysis. However, different hydrolytic 
rates for endogenous peptides and the artificial 
QconCAT protein are expected due to different 
sequence and secondary/tertiary structure that can 
interfere with the reaction kinetics. 

In order to guarantee equal amounts of free 
reference peptides after digestion, a simplified 
version of QconCAT based on tandem peptides 
was developed [19]. In this method two represen- 
tative peptides are fused together preserving a 
proteolytic site and are added to the protein com- 
plex before digestion. The enzymatic digestion 
generates an equal amount of both peptides 
regardless of the hydrolytic efficiency. 

An alternative quantitative approach aimed to 
improve the accuracy by assuring equimolar 
amounts of reference peptides is the EtEP 
(Equimolarity through equalizer peptide) 
[20]. In this method, reference labeled peptides 
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are synthesized fused to an equalizer peptide 
including a proteolytic site between the two. 
Before the addition to the sample, the fused 
peptides are digested with trypsin and quantified 
by peptide-based MS using the equalizer peptide 
to normalize the MS intensities. After adjusting 
the concentration of each peptide, a standard 
solution at equimolar concentrations can be 
prepared and used for tandem mass tag (TMT) 
[21], isobaric tag for relative and absolute quanti- 
tation GTRAQ), and mass-differential tag for 
relative and absolute quantitation (mTRAQ) 
investigations of the protein concentrations in 
the complex sample. 

Label-free methods are an alternative tool for 
quantitative targeted MS approaches which are 
based on LC-MS/MS data dependent acquisition. 
Intensity-based absolute quantification (iBAQ) 
[22] and exponentially modified protein abun- 
dance index (emPAI) [23] are widely applied 
methods for label-free proteomics and have been 
used effectively to determine the stoichiometry of 
complex subunits. 

The intensity-based absolute quantification 
method (iBAQ) relies on the correlation between 
the MS intensity of detected peptides and the 
protein concentration. A standard solution com- 
posed by an accurate selection of quantified 
proteins covering a wide range of concentrations 
is added to the sample (e.g., the commercially 
available Universal Proteomics Standard). The 
iBAQ index for each protein identified by 
LC-MS/MS analysis is then calculated as the 
ratio of the sum of the mass intensities of all 
the peptides assigned to an identified protein 
(observed) and its theoretical number of peptides 
generated by in-silico digestion (Nopservable), 
Eq. (4.1). 


PS Tobserved 


iBAQ= 
N Observable 


(4.1) 
The linear regression model obtained by 
correlating the logarithmic transformed iBAQ 
indexes versus the molar amount of the standard 
proteins allows the extrapolation of the absolute 
quantification of each protein in the sample. 
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The exponentially modified protein abundance 
index (emPAI) is based on a similar concept to 
iBAQ but, instead of the MS intensities, it is 
based on the number of peptides identified and 
assigned. The protein abundance index (PAI, 
Eq. 4.2) is calculated for each protein identified 
as the ratio between the number of peptides 
observed (Nobservea) for an identified protein and 
the total number of theoretical observable 
peptides (Nopservabic) for the same protein. 


N observed 


PAL = 5 ones (4.2) 
emPAI = 10°! — 1 (4.3) 
Protein content (mol%) = o (4.4) 


The emPAI values (Eq. 4.3) are calculated as an 
index of the abundance of a protein. The sum of 
all the emPAI values can be considered as the 
overall amount of proteins in the sample. For 
this reason, by dividing the emPAI value of a 
protein for the sum of the emPAI values of all 
the proteins in the sample, the relative amount of 
a given protein in the sample can be determined, 
Eq. 4.4. 


4.4 Structural Investigation 


of Protein Complexes 


In addition to the qualitative and quantitative anal- 
ysis for subunit identification and quantification, 
the contribution of mass spectrometry to the char- 
acterization of protein complexes extends to struc- 
tural biology. In recent years, cryo-electron 
microscopy (cryo-EM) has become, together with 
X-ray crystallography or NMR, one of the most 
widely applied methods for structural characteriza- 
tion of protein complexes. The requirement of a 
monodisperse protein-protein complex in vitreous 
ice instead of a crystal has made cryo-EM a very 
attractive method in comparison to X-ray crystal- 
lography for the investigation of complexes 
difficult to crystallize (e.g., large membrane 
complexes). Mass spectrometry, through different 
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approaches, is a supporting tool providing infor- 
mation about protein-protein interactions and pro- 
tein folding (e.g., secondary and tertiary structure 
or flexible regions) useful for the structural 
modeling of protein complexes and the better 
understanding of their dynamics [24—26]. 


4.4.1 Cross-Linking and Mass 


Spectrometry 


For a long time, protein cross-linking was used to 
stabilize labile protein complexes during extrac- 
tion and purification. In the last years, the peptide- 
based mass spectrometric analysis of cross-linked 
proteins (XL-MS) has become a very popular 
support technique to the structural analysis of 
protein complexes, providing useful constraints 
for the structural modeling of cryo-EM, NMR or 
X-ray crystallography data [4, 27, 28]. 

Cross-linking experiments are based on the for- 
mation of covalent bonds between the cross-linking 
reagents and the amino acids side chains of the 
proteins. Cross-linking reagents are composed by 
two reactive groups spaced by a linker chain. The 
reactive groups define the selectivity toward a cer- 
tain group of amino acids and the spacer length sets 
the distance cut-off between the reaction centers on 
the surface of cross-linkable proteins. 

Cross-linking reactions generate different 
types of products (i.e., intra-protein, inter-protein, 
and dead-end cross-links) as depicted in 
Fig. 4.1d. Intra-protein or inter-protein cross- 
links are formed when both the reactive sites of 
a cross-linker react with residues of the same 
protein or of two different proteins, respectively. 
On the other hand, dead-end cross-links 
(or mono-links) are generated when only a single 
reactive site of the cross-linker reacts with the 
protein and the other is hydrolyzed by buffer or 
reversible secondary reactions. 

The digestion of a cross-linked complex 
generates a mixture of peptides: free or unmodi- 
fied peptides, modified or dead-end peptides as 
well as intra- and inter-protein cross-linked 
peptides. Among the intra-protein peptides, cyclic 
intra-protein cross-linked peptides (i.e., loop- 
peptides) are observed when the cross-linker 


bridges two amino acid that after digestion are 
on the same peptide. 

The intra-protein and inter-protein cross-links 
are the most informative products provided by 
XL-MS experiments. However, their molar frac- 
tion is lower than that of free peptides. This is 
due to a number of factors such as incomplete 
cross-linking reaction, multiple peptide-peptide 
combinations, and cross-linking by-products 
(e.g., dead-end cross-links) generated during the 
reaction. The MS detection of cross-linked 
peptides is therefore complicated by their com- 
paratively lower concentration in the digests. To 
overcome this limitation, a strategy for their 
enrichment is encouraged. Two main differences 
between cross-linked and linear peptides are the 
higher molecular weight and the net charge. 
Therefore, these features can be exploited for 
enrichment protocols based on size exclusion 
(SEC) and strong cation exchange (SCX) chro- 
matography [29, 30]. A typical workflow for 
cross-linking and mass spectrometry investiga- 
tion is summarized in Fig. 4.2c. 

A number of cross-linking reagents are nowa- 
days commercially available, and the choice 
depends on the experimental questions and tech- 
nical features required. They have been exten- 
sively revised and classified according to their 
features but their list is constantly updated [31]. 

The choice of the best cross-linking reagent for 
structural investigations is based on amino-acid 
selectivity, solubility and membrane permeabil- 
ity, length, and chemical features of linker chain 
important for enrichment, MS detection, and 
identification of cross-linked peptides. 

Amino-acid selectivity is required to restrict 
the number of cross-linked peptides with same 
sequence but different connectivity, which 
translates in smaller molar fractions of each iso- 
form, higher complexity of mass spectrometric 
data, and wider database of potential cross-links 
to search, all leading to exponentially longer time 
for data analysis and less efficient identification. 

Most of the cross-linkers commercially avail- 
able and commonly used for XL-MS investigations 
react with amino-acid side chains containing 
primary amine, hydroxyl groups, or carboxylic 
acids (Fig. 4.3). The hydrophilicity and the wide 


= G. Degliesposti 
oO o o 
A , , ; 
QE RAAL SOs" 
‘ Ai o" b ae rE 
o 
i BS2G 9 Dss 8 
o 6 a 
P i NO nA g DNAN 
j neo ~s o" d o H H o 
‘oO DTSP DSBU 
o o 
ò Ge NaS pi f T aaa 
A Gtiaacdibi 6 o DSSO o 
f 5 J 
e o S Š fe) 
o 
HN, 
CBDPS N NH g a A, 
wo \=l CDI k 
B (a) Lys— Nho o 
A QS (b) Tyr/ThviSer—OH (a) Aye a 
Activated (b) A 7 Ser/ThiTyr 
NHS Ester mR 
Cc (a) 2x Lys — NH2 o 
A Lys — NH, TA JL pe 
A Tyr/Thr/Ser — OH N N 
an NN P 
N aay N L 
CDI 2x E p) (b) ika T a Serta Tyr 
N H 
H 
D 
o 
p N NH, b oN l c gog 
NX + 
HNI N” 2 N2 Way's OMe 
o 
SDH EDC DMTMM vy 
OMe 
E 
o Hays R ° 
: An NNN An” H 
Do ENNAN H H 1 (0 nny 
a | raw) i 
(a) 
o ° 
o 
N „NH 
J (c) „d 2 r ae 
7 1 
Asp/Glu G inii a ; N 
(e) Ry — OH 
OMe 
R 
Ss. fi yAn MeO (h) a Nig , 
=N 
Ry ei ape ome 
(b) KN 
o 


Fig. 4.3 Cross-linkers for mass spectrometry research. 
(a) Primary amine and hydroxyl specific cross-linking 
reagents based on NHS ester (a-f) and 


carbonyldiimidazole (g) reactive group. (b) Reaction 
products of activated NHS esters. (c) Reaction products 
of carbonyldiimidazole (CDI). (d) Example of 
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distribution of these amino acids on protein 
surfaces make them good targets for studying 
protein-protein interactions. 

The most commonly reported reagents for 
structural cross-linking experiments are the 
homobifunctional N-hydroxysuccinimide (NHS) 
esters (Fig. 4.3a, a-f). Introduced as cross- 
linking reagents in the mid-1970s, this class of 
compounds reacts with nucleophiles, preferen- 
tially primary amines (e.g., lysine side chains 
and amino termini of proteins) to form stable 
amides (Fig. 4.3b, a) [32]. However, a secondary 
reactivity towards the hydroxyl group (e.g., ser- 
ine, threonine, and tyrosine side chains) allows 
the formation of stable esters detectable by MS 
analysis (Fig. 4.3b, b) [33, 34]. In addition, NHS 
esters promptly react with both sulfhydryl and 
imidazolyl groups of cysteine and histidine 
residues, forming reversible covalent bonds not 
detectable by XL-MS workflows. 

NHS esters react in aqueous buffers free of 
primary amines, ammonia salts, and other reactive 
nucleophiles (e.g., thiols and imidazole). A basic 
pH between 7 and 8 increases the reactivity with 
the protein as well as the rate of hydrolysis 
catalyzed by hydroxide ions. The lower the tem- 
perature the slower the reaction kinetics is, and a 
balance between time and temperature is required 
based on the stability of the protein complex. Two 
(2) h at 4 °C or 30 min at 37 °C are common 
conditions for unstable and very stable complexes, 
respectively. 

Similar selectivity to NHS-esters has recently 
been reported for 1,1-carbonyldiimidazole (CDI) 
(Fig. 4.3b, g) [35]. This reagent, previously 
employed for the synthesis of amides, urea and 
carbamates [36-38], reacts with primary amines 


and hydroxyl groups of amino acid side chains 
(Fig. 4.3c). CDI generates stable urea or carbamate 
by reacting with two primary amines or an amine 
and a hydroxyl group, respectively. In contrast 
with NHS-esters, CDI mono-links are unstable 
and promptly regenerate the free amino acid side 
chain. CDI cross-linking reaction is efficiently 
incubated in aqueous buffers at slightly basic pH 
(between 7.2 and 8) at temperatures above 10 °C. 

An alternative to lysine-specific reagents is 
offered by the carboxy-selective reagents. Their 
application to chemical XL-MS investigation was 
reported for the first time in 2008 by Petr Novak 
[39] based on the reaction of dihydrazide cross- 
linkers with pre-activated carboxylic groups 
to form stable bonds (Fig. 4.3d, a). The activation 
of carboxylic acid was initially reported 
using EDC (1-Ethyl-3-(3-dimethylaminopropyl)- 
carbodiimide hydrochloride) (Fig. 4.3d, b) 
[38, 40], and more recently achieved using 
DMTMM _ (4-(4,6-dimethoxy-(1,3,5)-triazin-2- 
yl)-4-methyl-morpholinium-chloride) (Fig. 4.3d, 
c) [41, 42]. Both EDC and DMTMM allow a 
single-pot reaction with dihydrazide-based 
cross-linkers but in substantially different reac- 
tion conditions (Fig. 4.3e). EDC activation was 
described as a 2-h reaction in pyridine/hydrochlo- 
ride buffer (pH 5.2) at room temperature [39]. On 
the other hand, DMTMM activation is allowed in 
aqueous buffer (e.g., HEPES or PBS) at pH 7.4 at 
37 °C for 45 min. The milder conditions required 
by DMTMM extend its application to complexes 
unstable under non-physiological conditions [41]. 

Solubility of cross-linking reagents is another 
important feature to take into consideration. 
Hydrophobicity is an advantage for the analysis 
of purified membrane proteins or for in-vivo 


< 
Fig. 4.3 (continued) dihydrazides (a) and EDC and 
DMTM\M activators (b-c) required for cross-linking acidic 
residues. (e) activation of carboxyl side chain by 
DMTMM and EDC. The DMTMM activation of carbox- 
ylic acids goes through a nucleophilic aromatic substitu- 
tion of the N-methyl-morpholine with the carboxylic acid 
generating an activated ester intermediate (a). EDC acti- 
vation of carboxylic acids forms a reactive O-acylisourea 


intermediate (b). Both the activated intermediates react 
with dihydrazide nitrogen (c) forming a stable bond (f). 
In addition, reactions with either primary amines (d), e.g., 
lysine side chains and N termini, or hydroxyl groups (e) 
from serine, threonine, and tyrosine side chains, occur 
forming stable amide bonds (g) or esters (h) 
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cross-linking. Hydrophobic  cross-linkers are 
membrane permeable and can react with regions 
of proteins on both the external and the internal 
side of membranes or micelles. Furthermore, they 
can cross the membrane of living cells and fix 
transient complexes in-vivo. However, the low 
water solubility of hydrophobic cross-linkers 
requires their pre-solubilization in organic 
solvents like DMSO). To limit the chaotropic 
activity of organic solvents that can interefere 
with the native folding of the protein complex, 
their concentration in the sample solution should 
be kept as low as possible. 

The water solubility of NHS-esters is low and 
stock solutions are usually prepared in DMSO. 
Sulfo-NHS are soluble analogues of NHS esters 
(Fig. 4.3a, a). Their negatively charged sulfo- 
substituent on the succinimide ring increases the 
solubility in aqueous buffers preventing mem- 
brane permeability. The solubility of dihydrazide 
cross-linkers is promoted by the polarity of 
dihydrazide functional groups and limited by the 
length of the hydrophobic chain spacer. 

The main role of linker chains in cross-linking 
reagents is to set the distance cut-off between two 
amino acids. The longer the distance the wider 
the probed surface is. Consequently, the higher 
the chance of cross-link formation and the 
lower the spatial resolution of the structural infor- 
mation acquired are. The highest resolution is 
reached by zero-length cross-linkers, which pro- 
mote the condensation of amino acid side chains 
without introducing any additional atom, e.g., EDC 
and DMTMM (Fig. 4.34, b, c), or inserting a single 
carbonyl group, e.g., CDI (Fig. 4.3a, g) [35, 39, 
41]. Despite the high-resolution potential of zero- 
length cross-linkers, the chain length of most 
reported reagents for XL-MS studies is around 
12-14 A (e.g., DSS, BS3, DSSO, and DSBU). 

In addition to setting the distance between 
reactive groups, additional features of spacer 
chain such as biotinylation, isotopic composition, 
and incorporation of a MS cleavable group play 
primary roles for the enrichment or identification 
of cross-linked peptides. 

Biotinylated linkers were designed to simplify 
the enrichment of cross-linked peptides by pull- 
down with avidin or streptavidin coated beads. 
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An example of a biotinylated crosslinker is the 
homo-bifunctional NHS-ester CBDPS (Cyanur- 
Biotin-Dimercapto-Propionyl-Succinimide) 

(Fig. 4.3a, e) [43, 44]. 

Isotopic composition and inclusion of MS 
cleavable groups are chemical features of the 
linker chain aimed at improving the discrimina- 
tion between cross-linked and free peptides dur- 
ing the MS data analysis. 

Isotopically labeled cross-linkers are isoforms 
of the same reagents with a different isotopic 
composition of the spacer chains. By mixing the 
reagents at a defined ratio of heavy and light 
isoforms, cross-links with mass and MS intensity 
proportional to the number of isotopes and the 
isoform concentration are generated. After diges- 
tion and MS analysis, the search of cross-linked 
peptides is first focused on the detection of parent 
ion pairs corresponding to heavy and light 
isoforms of the same cross-link. After that, the 
MS/MS spectra of the shortlisted pairs are 
searched against the sequence database restricting 
the search space and simplifying the assignment 
procedure [45]. Isotope-labeled cross-linker 
selective for both primary amine and carboxylic 
acid are commercially available and reported in a 
number of studies [29, 41, 46—48]. 

An additional strategy aimed at simplifying 
and driving the MS data analysis toward the iden- 
tification of cross-linked peptides is based on 
cleavable linkers. The spacer chain of these 
reagents contains a chemical bond that is 
fragmented by Collision Induced Dissociation 
(CID or HCD) or Electron Transfer Dissociation 
(ETD) during the MS analysis. The inclusion of a 
MS-cleavable linker allows the fragmentation of 
cross-linked peptides into two modified linear 
peptides improving their detection in very com- 
plex samples and in proteome-wide experiments 
[49-55]. Disuccinimidyl Dibutyric Urea (DSBU, 
also known as BuUrBu; Fig. 4.3d, d) [56], and 
Disuccinimidyl Sulfoxide (DSSO; Fig. 4.3d, g) 
[57] are examples of cleavable reagents that are 
commercially available. The wide range of inves- 
tigation offered by cleavable crosslinkers has led 
to the fast development of new compounds with 
different reactivity such as photoactivatable 
reagents [35, 58—60]. 
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A further class of cleavable linkers is 
represented by thiol-cleavable reagents, e.g., 
di-thio-succinimidyl-propionate DSP or DTSP 
(Fig. 4.3a, c) [61]. Characterized by linker chains 
containing a disulfide bond, these reagents are 
both MS and chemically cleavable. These 
features are attractive for the transient stabiliza- 
tion of labile complexes during their extraction 
and purification. The reduction of the disulfide 
bond after digestion releases tagged linear 
peptides whose identification is easier than that 
of cross-linked peptides improving the protein 
sequence coverage and the disclosure of networks 
of protein interactions. 

Cross-linked peptides are analyzed by reversed 
phase liquid chromatography and mass spectrom- 
etry. The peptides are separated on a gradient of 
acetonitrile acidified with formic acid and tailored 
to the sample complexity. The data acquisition 
protocols vary according to the cross-linker and 
the instrument used. MS/MS methods based on 
data dependent acquisition and CID or HCD frag- 
mentation are used for both isotope-labelled and 
cleavable cross-linkers (e.g., DSBU) [53, 56, 
62]. Furthermore, ETD fragmentation and MS? 
methods have been reported for the cleavable 
cross-linker DSSO [63]. 

The MS data analysis and the searching 
strategy depend on the cross-linker features and 
the data acquisition protocol used. Several soft- 
ware packages are nowadays available for the 
analysis of MS data acquired from cross-linking 
experiments: XQuest, StavroX/MeroX, and 
XlinkX form a non-exhaustive list of the most 
frequently reported software packages [41, 50, 
64, 65]. Cross-links identification rely on the 
in-silico generation of a database of cross-linked 
peptides used for the search and assignment of MS 
data. Databases are usually seeded with decoy 
proteins (e.g., reversed protein sequences) to pro- 
vide a statistical validation of the assignment by 
calculating the false discovery rate (FDR). As a 
final validation step, the spectral data assigned to 
cross-links are manually interrogated. 

The list of validated cross-linked peptides is 
used to generate topological maps of interactions 
between the proteins in the complexes and several 
software and applications are available for this 


purpose, e.g., Xvis, xInet [66, 67]. In addition to 
the topological mapping, the identified cross- 
links can be mapped and checked on available 
structural data. For this purpose, a dedicated soft- 
ware program (Xlinks) has been developed and 
integrated in 3D viewer software [68]. 


4.4.2 Hydrogen Deuterium Exchange 


(HDX) 


Methods for mass spectrometric data acquisition 
are designed and tailored to improve the identifi- 
cation of unknown proteins or the quantitative 
determination of known proteins. Proteins have 
a flexible and dynamic nature necessary to exert 
their chemical and biological activity. Structural 
investigation methods such as X-ray crystallogra- 
phy and cryo-EM provide mostly static 
representations of the most stable conformations 
in specific experimental conditions. Insights into 
the conformational changes and flexibility of the 
proteins provide a deeper understanding of pro- 
tein function and mechanism of activity. As 
summarized in Fig. 4.2d, Hydrogen Deuterium 
Exchange (HDX) is a method for the study of 
conformational dynamics of proteins and protein 
complexes. It is based on monitoring the rate of 
proton exchange between protein backbone and 
deuterated solvent. The rate of proton exchange 
of a protein region is proportional to both solvent 
exposure and structural organization. Defined 
secondary structures such as a-helices and 
B-sheets display a slower exchange rate than 
unstructured regions (e.g., loops or disordered 
segments). Therefore, the comparison of deute- 
rium incorporation rates allows the identification 
of structured and unstructured regions of the 
proteins. Peptidyl hydrogens, due to their low 
exchange rate (ranging from fractions of a second 
to days), can be effectively monitored by mass 
spectrometry. The exchange rates of other 
hydrogens (e.g., those bound to N, O, and S of 
amino acid side-chains) are too high to be moni- 
tored without back-exchange interferences. 
Hydrogen/Deuterium exchange is based 
on acid-base reactions. The exchange rate is 
influenced by pH, temperature and, as mentioned 
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above, structural factors (e.g., secondary, tertiary, 
and quaternary structures of proteins and protein 
complexes). Ice cold buffer at pH 2.5 minimizes 
the exchange rate of peptidyl hydrogens for 
an unstructured peptide leading to a half-time 
of exchange longer than 1 h. Conversely, the 
exchange rate is increased by higher temperature 
and higher or lower pH values. 

Once the reaction conditions are defined (tem- 
perature, buffer, and pH) and kept constant, the 
intrinsic exchange rate (k;) for an unstructured 
peptide can be determined providing a measure 
of the influence of the primary sequence on the 
exchange rate [69]. 

The kinetic constants for hydrogen/deutertum 
exchange of folded proteins (kex) differ by several 
order of magnitude from those of unstructured 
peptides (Fig. 4.4a). This difference is due to 
secondary structure stabilization by hydrogen 
bonding that requires a transient rupture and ref- 
ormation of the bonds to exchange the hydrogens 
with solvent deuterium. The dynamic equilibrium 
of protein unfolding/refolding permits amide 
hydrogen exchange even in the most structured 
regions. Figure 4.4b summarise the hydrogen 
deuterium exchange in a structured protein. 
Constants k; and k_; are the unfolding and 
refolding rate constants, respectively, and k is 
the hydrogen deuterium exchange rate constant 
in the unfolded protein (approximately equal to 
the intrinsic exchange rate kz = k;). The overall 
exchange rate can be split into two contributions: 
the unfolding/refolding equilibrium (k;/k_;) and 
the intrinsic exchange rate of the unfolded 
sequence kz. Based on the ratio between k and 
k_; two main hydrogen exchange kinetic profiles 
are identifiable. If the exchange constant k> is 
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Fig. 4.4 Hydrogen deuterium exchange reaction. (a) Sin- 
gle step chemical reaction of Hydrogen/Deuterium 
Exchange for a generic unstructured peptide (Pept) occur- 
ring at an intrinsic exchange rate (k;), and allowed in 
deuterated water (D20). (b) Representation of the multiple 
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much larger than the refolding constant 
k_; (k2 >> k_;) the exchange is very rapid and 
takes place every time a hydrogen bond is broken. 
In these conditions the overall exchange kinetic 
constant kex depends only on the unfolding rate 
kinetic constant k; (kex = k,). This condition, 
termed EX, kinetics or monomolecular exchange, 
is not very common and is mostly observed when 
unfolded states are favored by sequence or 
induced by denaturants. A more common situa- 
tion displays a refolding rate constant (k_;) much 
larger than the exchange rate constant (k2). In 
these conditions when k_; > > kp, the rupture 
and regeneration of a hydrogen-bond is faster 
than the hydrogen exchange. Known as EX, 
kinetics or bimolecular exchange, this exchange 
kinetics (see equation below) allows the determi- 
nation of the unfolding equilibrium constant Kyn¢ 
and the kinetic constant k> (Eq. 4.1) [128]. 
Eq. 4.5 [70]. 

ke =k% x Ko = Kung X ko (4.5) 
For certain proteins or complexes, a combination 
of EX, and EX, profiles coexist and can be 
discriminated by the spectral data [71, 72]. 

HDX experiments start with the protein incu- 
bation in deuterated water for defined amounts of 
time allowing the hydrogen to freely exchange 
with deuterium. The number of time-points 
depends on the experimental design and on the 
structure investigated, and appropriate incubation 
times can range from seconds to days. The 
exchange reaction is quenched by adding an 
ice-cold buffer solution at pH 2.5. These 
conditions limit the back-exchange of the back- 
bone amide incorporated deuterium and are kept 
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steps reaction of Hydrogen/Deuterium Exchange for a 
hypothetical protein region. The overall exchange rate 
depends on two contributions: the unfolding (k;) and 
refolding (k_;) rate constants, and the intrinsic exchange 
rate of the unfolded region (k2) 
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constant for all the following analytic steps. In a 
peptide-based HDX-MS approach, the protein is 
digested using an acid functional protease (e.g., 
pepsin) before LC-MS/MS analysis. To prevent 
deuterium back-exchange, the chromatographic 
step is required to be rapid and performed at low 
temperature. Pepsin functionalized LC columns 
are available for rapid digestion and direct data 
acquisition during LC-MS/MS. This approach 
generates overlapping short peptides allowing a 
high sequence coverage and determination of 
exchange rates at peptide resolution. 

The data analysis starts with the verification of 
deuterium incorporation, then continue with the 
peptide identification and the evaluation of the 
amount of hydrogens exchanged. The successful 
incorporation of deuterium is checked by compar- 
ing the isotopic profile of unlabeled and deuterated 
samples. For both unlabeled and completely 
deuterated peptides the isotopic distribution is 
expected to appear as a Cauchy distribution with 
shifted m/z values. However, regardless of the 
relative amount of incorporated deuterium, the 
isotope distribution of labeled peptides displays a 
symmetrical Gaussian distribution. This is due to 
deuterium back-exchange and the maximum deu- 
terium incorporation is therefore identified by the 
maximum shift of m/z values [73]. 

Once the incorporation of deuterium is con- 
firmed, data analysis proceeds with the peptide 
identification. Pepsin, in contrast with trypsin, has 
a non-specific activity with a preference for 
hydrophobic amino acids making the hydrolysis 
unpredictable and peptide identification more 
complicated. Therefore, different approaches 
from those described for protein identification in 
Chap. 3 are required for data interpretation. 

Peptide identification is based on the genera- 
tion of a peptide list from the LC-MS/MS analysis 
of a non-deuterated sample. The m/z value of each 
MS ion detected is screened against a database 
containing all the possible peptide sequences 
generated by sample digestion and pre-assigned 
to a peptide on an exact mass-based approach. 
The MS/MS data are then used to confirm the 
assignment. The peptide list generated will 
include the mono-isotopic mass, the charge 
state, and the retention time of the identified 
ions together with the amino acid sequence 


assigned. The aim of this list is the generation of 
an indexed database of assigned peptides to be 
used for the assignment of deuterated isoforms. 

The deuterium incorporation is measured by 
relative quantification between the deuterated 
samples and the non-deuterated ones. If the 
exchange of a given peptide follows an EX3 
kinetic profile, deuterium incorporation increases 
proportionally with exposure time. The identifi- 
cation of deuterated samples is based on the 
precomputed peptide list, considering parameters 
such as charge state, shift of m/z values, and 
increase in mass of the isotope distribution. 
Once the deuterated peptide is assigned, a relative 
quantification of the incorporated deuterium is 
possible. Therefore, for each peptide it is possible 
to evaluate the relative amount of deuterium as a 
function of time. A visual summary of the 
exchange is achieved by plotting the amount of 
deuterium incorporation over time. For similar 
purposes, many other representations that con- 
sider the whole sequence of the protein can be 
generated. For example, by rendering the deute- 
rium levels on peptide maps or on the ribbon of 
3D models or using difference plots to compare 
different experimental conditions for the same 
proteins [74-77]. 

The interest on HDX-MS for the investigation 
of proteins and protein complexes from both 
structural and functional point of view is con- 
stantly increasing. The development of new 
methods as well as the availability of automated 
systems and instrumental techniques has allowed 
the application of this technique to a wider list of 
targets, ranging from simple ligand-protein 
interactions to protein-protein interactions. The 
clear advantage of HDX-MS in comparison to 
other methods for peptide-based mass spectrome- 
try relies on the time course investigation leading 
to a better observation of dynamic of structural 
changes [78-81]. 
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Abstract 


There are myriads of protein-protein 
complexes that form within the cell. In addi- 
tion to classical binding events between 
globular domains, many protein-protein 
interactions involve short disordered protein 
regions. The latter contain so-called linear 
motifs binding specifically to ordered protein 
domain surfaces. Linear binding motifs are 
classified based on their consensus sequence, 
where only a few amino acids are conserved. 
In this chapter we will review experimental 
and in silico techniques that can be used for 
the discovery and characterization of linear 
motif mediated protein-protein complexes 
involved in cellular signaling, protein level 
and gene expression regulation. 
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5.1 Introduction 
Two decades of exploration on the disordered 
part of the proteome highlighted the role of 
unstructured protein segments in protein-protein 
complex formation. Characterization of the disor- 
dered proteome of various organisms revealed 
that there might be thousands of functional sites 
within these poorly characterized protein regions, 
possibly governing millions of protein-protein 
interactions (PPI) within the cell [1]. Protein- 
protein complexes involving some disordered 
fragments are thus abundant and are different 
from classical PPIs forming between two globular 
domains. The fundamental difference is that dis- 
ordered protein regions bind to their interacting 
domains via amino acids that are sequential, thus 
linearly organized, in the interacting protein, 
hence they are also referred to as short linear 
motifs (SLiMs) [2]. Linear motifs are physically 
defined by their ability to undergo a disorder-to- 
order transition upon binding to their partners 
(folding when binding). There is a lot of confu- 
sion about SLiMs — especially in the older litera- 
ture (>10 years ago). Although various protein 
stretches were often designated by biologists as 
motifs, only true linear motifs (lacking a fixed, 
intrinsic structure of their own) can act as autono- 
mous binding elements. 

In contrast to protein interfaces formed 
between globular domains, where interacting 
residues come from different parts of the 3D 
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structure, the contact residues of linear motifs are 
contained within a relatively short (3—25 amino 
acids) stretch. In principle, this should make the 
identification of linear motifs binding to the same 
protein domain straightforward, based on their 
sequence similarity. Unfortunately, the low infor- 
mation content of SLiM sequences renders them 
elusive for simple consensus motif based in silico 
searches [3]. This is because only a few fixed 
positions (as low as 2—3 amino acid residues, 
albeit more typically 4-5) define a particular 
type of linear motifs, called classes. The Eukary- 
otic Linear Motif (ELM) database currently 
contains more than 200 motif classes with more 
than 2000 instances [4]. Linear motifs generally 
bind to shallow, but dedicated surfaces located on 
well-structured protein domains (catalytic and 
adaptor domains alike) and can adopt diverse 
conformations upon binding (Fig. 5.la). As 
these binding elements are functionally modular, 
SLiMs can also display an island-like evolution- 
ary conservation, having the characteristics of 
isolated peptides [5]. Linear motifs tend to remain 
highly exposed to the solvent, even at their 
bound, folded state. This allows a great deal of 
flexibility in their amino acid sequence as 
residues not contacting the partner protein surface 
can often be arbitrary [5]. 

The binding affinity of linear motifs varies 
widely, from the nanomolar to the high micromo- 
lar range — depending on the number of contact 
residues, the entropic cost of becoming ordered, 
and the nature of the secondary interactions 
between motif and the domain surface 
(Fig. 5.1b). In addition, SLiMs can be subject to 
secondary modifications: thus they can function 
as dynamic switches [6]. Phosphorylation of 
residues in key positions often alters the binding 
affinity of linear motifs, which forms the molecu- 
lar basis of feedback regulation and dynamic 
responses in cellular signaling [7]. This chapter 
will include a short review of experimental PPI 
techniques that can be used for linear binding 
motif discovery. We shall also discuss how in 
silico predictions can be used to find putative 
motif classes and motif instances at the level of 
proteomes, as well as the importance and pitfalls 
of cell-based validation tools. Finally, we 
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highlight how linear motifs are used as building 
blocks in nature, to make functional protein 
assemblies and highly dynamic complexes. 


5.2 Experimental PPI Techniques 
for Linear Binding Motif 


Discovery 


Protein-protein interaction compendia were first 
assembled by researchers interested in the molec- 
ular basis of specific physiological processes. 
This was a slow-paced approach and binding 
between two proteins were primarily probed by 
pull-down and co-immunoprecipitation (co-IP) 
experiments. Each experiment required the care- 
ful preparation of the bait and its binders (called 
preys) for in vitro binding studies, while prey 
identification for a given bait in cell-based assays 
depended on specific antibodies or required 
sophisticated analytical know-how (e.g., mass 
spectrometry-based sequencing of tiny amounts 
of in-gel digested protein material). In these 
low-scale experiments identification of new inter- 
action partners required a great effort and thus 
progress was slow. Eventually, the technology 
for handling entire open reading frame (ORF) 
collections was developed. In addition, new 
techniques for the in vitro and cell-based identifi- 
cation of binary interaction pairs emerged (e.g., 
yeast two hybrid, Y2H, and protein-fragment 
complementation assay, PCA) that turned out to 
be suitable for large-scale experiments [8]. 
Furthermore, affinity purification (AP) tags 
attached to bait proteins allowed the identification 
of intact complexes whose components could 
be identified by mass spectrometry (AP-MS) 
[9]. These developments enabled high-throughput 
techniques to explore protein-protein connectivity 
maps at the level of proteomes, giving hope that 
understanding of cellular functions at systems 
level will shortly be within reach. Thus, specific 
cellular functions (e.g., cellular signaling, metabo- 
lism and transcription) could be put into a bigger 
context by piecing smaller PPI networks together 
through common nodes [10]. As it turned out, 
these generic high-throughput PPI discovery 
tools still have fundamental limitations and need 
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Fig. 5.1 Properties of linear motif mediated protein- 
protein complexes. (a) Linear motifs undergo a disorder- 
to-order transition upon binding and become folded. They 
can adopt an enormous variety of conformations in their 
bound form, including a-helices (MDM2-p53 motif), 
B-sheets (PP2BA-NFAT1 motif), type II polyproline heli- 
ces (SMURF1-SMAD1 motif) and turns (KEAP1-NRF2 


to be complemented by low-scale PPI validation 
studies. 

About two decades ago it already became 
apparent through classical low-scale deletion 
mapping (where the regions responsible for a 
binary interaction could be identified by system- 
atic deletions of segments from full-length 
proteins) that many protein-protein interactions 
are mediated by surprisingly short fragments. 
These SLiMs are autonomous and bind to 


Highly dynamic signaling 
complexes 


motif). (b) These interactions vary in their binding 
strength, across almost six orders of magnitude. The stron- 
gest linear motifs might show a Kp dissociation constant 
below the nanomolar range, while the weakest are nearly 
millimolar. However, most known examples have a bind- 
ing affinity between Kp = 100 nM and 10 pM 


structured domains of their partners. Currently, 
there are several high-throughput PPI 
technologies available for researchers interested 
in exploring linear motif mediated interactions 
[11]. Chemical synthesis on a solid support 
might also be used to generate a library size of 
100-1000 unique short peptides, which can then 
be probed for binding to globular proteins 
[12]. Phage display is a more biological approach 
that can be applied to select short protein 
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fragments binding to a functionalized surface 
[13]. The successful binders can be easily 
identified by DNA sequencing from complex 
libraries (>10!° clones) as there is a clonal rela- 
tionship between the phage and its displayed pro- 
tein sequence that enables its binding to the target 
protein surface. Only short protein fragments 
(e.g., SLiMs) can be efficiently displayed by 
phages, therefore this technique is a great tool to 
identify linear motifs binding with high affinity 
(in the low micromolar, but more realistically in 
the nanomolar range). Because of the high com- 
plexity of phage libraries, the full human prote- 
ome can be represented as a large collection of 
short protein fragments (10—20 amino acids), and 
then probed against any target protein surface for 
interaction [14]. For example, this type of 
proteomic phage display was used to find 
peptides that can bind to PDZ (postsynaptic 
density protein 95/discs large/zona occludens 1) 
domains found in the human proteome 
[15]. Using the same phage library that represents 
the disordered part of the proteome as short 
fragments (e.g., as 15—20 amino-acid-long par- 
tially overlapping stretches), formerly unimagin- 
ably rich PPI datasets could be obtained basically 
for any globular protein [13]. 

There are several cell-based techniques that 
could be used for peptide motif discovery, 
although they were initially applied as general 
PPI discovery tools. These include the most 
widely used yeast two hybrid (Y2H), bacterial 
two-hybrid (B2H) and its mammalian equivalent 
(M2H), where a bait-prey interaction will activate 
the transcription of a reporter gene or enzyme 
[16, 17]. A yeast surface display is now also 
available, that could only be paralleled by phage 
display [18]. In contrast to these, affinity purifica- 
tion mass spectrometry (AP-MS) is more ade- 
quate to gain information about protein-protein 
complexes present in a native cellular environ- 
ment, while it is less suitable to detect low affinity 
interactions. Conversely, Y2H and M2H are far 
better suited to detect low affinity and transient 
PPIs, which are often the hallmarks of SLIM 
mediated interactions. There are other cell-based 
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techniques that might also be used for proteome 
level analysis, albeit they are better suited to 
validate the hits of primary screens (see later). 
PCA based techniques exploit that some reporter 
proteins (e.g., GFP or luciferase) can be split up 
and fold only if their two fragments are held 
together by bait-prey binding [19-21]. In addi- 
tion, the mammalian protein-protein interaction 
trap (MAPPIT) analysis is an interesting 
two-hybrid based assay, designed to ameliorate 
the high false negative rate of classical 
two-hybrid methods. Here, the ligand-dependent 
transcriptional output of an artificial cytokine sig- 
naling pathway is restored only if there is an 
interaction between bait and prey [22]. The 
various assay designs that can be potentially 
used to explore SLiM-mediated interactions and 
complexes are listed in Table 5.1. 

Early estimates on peptide motif mediated PPIs 
reckoned that 15—40% of protein associations 
within the cell might be mediated by linear motifs. 
Using the above-described low-scale and high- 
throughput screening (HTS) methods it is now 
well established that SLiMs are indeed extremely 
abundant. Challenges, however, are still abound. 
For example, standardized protocols for HTS hit 
validation need to be worked out and gain accep- 
tance by the research community [23]. Validation 
will require testing of the identified SLiM in the 
context of the full-length protein and in cell-based 
assays in order to eliminate the high false positive 
discovery rate of some current HTS techniques. 
On the other hand, as most SLiM mediated 
interactions are weak (> micromolar in affinity), 
experimental methods involving washing steps to 
eliminate nonspecific binding are plagued by false 
negatives. Overall, results of generic HTS studies 
need to be better aligned with that of more classical 
low-scale studies on SLiMs, or at least the 
discrepancies need to be understood. Finally, the 
importance of posttranslational modifications 
(PTM) also needs to be addressed, as it is now 
established that short linear motif mediated 
interactions are greatly affected by 
various PTMs (e.g., phosphorylation, acetylation, 
sumoylation) [6]. 


also 
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Table 5.1 Summary of PPI techniques with pros and cons 

Name of main 

method Sub-method or variant | Recommended for Potential disadvantages References 
Peptide or protein | Solid-phase peptide Strong binding Not suitable for “naive” screening, | [12] 
arrays array binding (with events known to rely | due to limited array sizes. 

radioactive detection) | on a small epitope Hydrophobic epitopes might not 
be exposed correctly. 

Biological Classical phage Any linear motif Will not capture transient [13] 
surface display display (M13 dependent interactions. Library bias can be a 

selection bacteriophage) interaction (with problem for longer motifs 

unknown motifs) 

Phage display with Enzyme-substrate Only useful for certain enzymes. [43] 

modifying enzyme type interactions Capturing of modified eptitopes 

(e.g. phosphorylation) | (transient binding) needs to be robust. 

Yeast surface display | Glycosylated, or Relatively limited library size. [18] 
other large More suitable for folded domains 
extracellular epitopes | than motifs. 

Affinity AP-MS with tandem | Well-defined, Has difficulty in capturing [9] 
purification — affinity purification constitutive, soluble | dynamic and low-affinity 

Mass (TAP-tagging) complexes ineractions. Results need to be 
spectrometry (regardless of size) adjusted for cellular protein 

(AP-MS) abundance. 

Protein fragment | Split-luciferase based | Interactions already | Not well suited for large-scale [40] 
complementation | assays (e.g. NanoLuc) | suspected, based on | screening. Fusion points need to 

assays (PCA) other methods. be optimized for each protein. 

Bimolecular Interactions already | Not well suited for large-scale (20, 21] 

fluorescent suspected, based on | screening. Fusion points need to 

complementation other methods. be optimized for each protein. 

(BiFC) 

Two-hybrid Yeast two-hybrid, Strong, preferably Comes with a considerable false [8] 
screening (2H) classical Gal4 (Y2H) | nuclear protein positive rate. Self-activating 
complexes clones are a problem. 

Yeast two-hybrid Conditional The modifying enzyme system [42] 

(Y2H) with interactions (positive | needs to be operational in yeast 

modifying enzyme switches), such as and orthogonal to endogenous 
phosphotyrosines systems (rarely satisfied). 

Bacterial two-hybrid | Large libraries, well | Due to copy number variation of | [16] 

(B2H), adenyl cyclase | suitable for plasmids, the read-out can be quite 

based non-nuclear proteins | noisy. 

Mammalian Complexes relianton | Limited library size. [17] 

two-hybrid (M2H), organism-specific 
Gal4 based secondary 
modifications 
Signaling Mammalian protein- | Protein complexes Short-lived interactions will not be | [22] 
pathway protein interaction only formed detected. Nuclear proteins can be 
reconstitution trap (MAPPIT) correctly in problematic. 

mammalian cells 
Proximity-based | In vivo biotinylation | Complicated cellular | Labelling does not necessarily [34] 
protein labelling | assays (streptavidin complexes imply a direct interaction. 

capture / MS) (including weak Labelling efficiency varies among 
interactions) proteins. 

Solid-phase enzyme Direct or indirect Needs to be optimized for each [5] 


activity arrays 
(e.g. phosphorylation) 


enzyme-substrate 
interactions 
(including dymanic 
ones) 


particular enzyme. Scale-up can 
be a problem. 


(continued) 
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Table 5.1 (continued) 


Name of main 


method Sub-method or variant 


Enzyme-fused protein 
activity arrays 

(e.g. SAMDI) 
Crosslinking with 


in vitro 
In vivo protein 


crosslinking mass spectrometry only forming under 
detection (XL-MS) certain conditions 
5.3 Prediction of Linear Motifs 


In Silico 


The simplest goal one can have in mind with linear 
motifs is to find new instances of otherwise well- 
characterized motif classes (which also means 
discovering new partnerships). Establishing a 
motif requires a detailed structural knowledge 
about the interaction, to clarify which amino 
acids (positions) are critical for binding, and 
which substitutions are tolerated without the loss 
of interaction. The resulting consensus motifs 
might contain wildcards (“.” or x), where any 
amino acid is allowed, positions with multiple 
ambiguities [abc], or those with particular amino 
acids excluded [^d]. With the regular expression 
(RegEx) formalism, it is particularly easy to search 
for novel motif candidates in any protein sequence 
[24]. This can be conveniently done on online 
servers, such as the Eukaryotic Linear Motifs 
(ELM) suite. The ELM suite also features an 
extensive collection of manually curated, experi- 
mentally validated linear motifs as well as details 
for selected examples. Thus, it is an excellent tool 
to scan our protein of interest against a complete 
collection of established linear motifs [4]. 

A similar, but more challenging task is the 
identification of arbitrary motifs in complete 
proteomes. ScanProsite offers an easy opportu- 
nity to do just that: after defining a linear motif, 
the user is free to search for potential occurrences 
in any proteome of interest [25]. However, it 
should be noted that the output of sequence- 
based searches should always be filtered before 
interpreting the results. To remove spurious and 
structurally inaccessible motifs, which may match 
to the query sequence but are buried inside a 
folded domain or are located in a cellular 
compartment that is inaccessible for the binding 


Recommended for 


Can detect weak and 
transient interactions 


Weak interactions, 
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Potential disadvantages References 
Not broadly used. Fusion protein | [33] 
engineering can be problematic. 

Very prone to artifacts. Not [35, 36] 


particularly suitable for large-scale 
interactomics. 


partner, some filters are routinely used. These 
structural or topological filters are often combined 
with evolutionary filters. This second step 
removes newly-evolved sequences that might 
not have any important biological function in 
the given organism. The SlimSearch4 suite 
incorporates all these filtering options to find 
user-defined motifs in many proteomes, and it is 
the suggested method of choice when working 
with proteomes of model organisms [26]. 

In silico methods may also be applied to dis- 
cover linear motifs directly from interactomes 
(Fig. 5.2a). De novo identification of linear motifs, 
with no structural knowledge on the interaction, is 
probably the most difficult task. However, in addi- 
tion to heuristic identification of shared motifs in 
PPI partner sets, it is also possible to identify 
motifs with fully automated methods [27]. First, 
one needs to establish a highly reliable set of 
interactor proteins. Then (after filtering) the 
remaining sequences should be subjected to suit- 
able software, such as MEME [28]. These de novo 
methods most commonly apply an alphabet build- 
ing approach to come up with a regular expression 
that covers as many instances of partner proteins as 
possible. While powerful, such approaches have 
many potential pitfalls: they only allow exact 
characters (with no chemical similarity handling) 
and have difficulties with non-standard amino 
acids (such as secondary modification sites). 
Because of the general degeneracy of linear motifs, 
they also require a rather high number of valid 
interactors (which is often unavailable). Therefore, 
newer methods have also been developed, using 
hidden Markov chain models to make them more 
sensitive, although these are still somewhat error- 
prone [29, 30]. 

For in silico methods the caveat is the 
underlying assumption that all protein-protein 
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Fig. 5.2 Approaches and challenges of in silico motif 
discovery. (a) In silico methods can be used in two 
ways. One might aim to extract the linear motif from a 
reliable dataset of interactor proteins, in the form of con- 
sensus motif(s). Once a motif has been found and 
characterized, it can also be used to predict new members 


interactions use the same surface, the same geom- 
etry, and the same motif type. While this might be 
true for a few selected examples, protein-protein 
interactions may be vastly more complex 
(Fig. 5.2b). The fact that the same partner protein, 
the same domain, and even the same surface can 
admit multiple SLiMs complicates their detection 
considerably [5]. Therefore, it is generally 
advised to assume multiple consensus motifs for 
each interaction. Another related problem is the 
inherent degeneracy of linear motifs. Moreover, 
natural SLiMs are often not optimized for binding 
strength. Consequently, they exist as multiple, 
relaxed versions of a hypothetical optimal 
consensus [31]. Because of these shortcomings, 
purely sequence-based in silico approaches 
should be complemented by structure-based scor- 
ing schemes whenever possible [32]. Naturally, 
this requires structural templates (protein-peptide 
complexes) that represent all viable SLiM bind- 
ing modes [5]. Testing three-dimensional com- 
plementarity of peptides to the partner domain 
surface will then help to filter hits before starting 
their experimental validation. 


5.4 Capture and Validation 
of Protein-Linear Motif Based 


Interactions 


Generic HTS PPI and in silico-based predictions 
described above are likely to have many false 
positive hits. Ideally, each new linear motif 


of the interactome, previously missed by experiments. 
(b) Binary interactions between two proteins may be 
based on multiple consensus motifs: there could be multi- 
ple valid motif conformations for a single binding site 
(classes I-II in the hypothetical example) or there are 
multiple binding sites (class II) per protein domain 


instance should be validated by focused 
low-scale or other experimental screens, orthogo- 
nal to the primary screen. There is no universal 
solution for this validation step, albeit there are 
some good practices that need to be followed. 
Putative SLiM mediated interactions need to be 
validated in vitro if the primary screen was cellu- 
lar. Conversely, cell-based assays should be 
performed for confirmation if the primary screen 
was in vitro. Furthermore, if an interaction was 
detected between full-length proteins and it is 
suspected that a short region from one of the 
partners mediates binding, then this should be 
confirmed using an in vitro fragment-based 
experiment. Similarly, if short protein fragments 
were used in the primary screen, then testing the 
SLiM in situ, in the context of the full-length 
protein, will be needed. Finally, the biological 
functionality (not just the binding ability) of a 
SLiM should be confirmed by mutational analysis 
in a cellular environment, using a specialized 
in vivo setup. 

For weak and transient binary interactions, 
two-hybrid based (or similar) techniques might 
be good “first hint” screening methods that can 
also be automated. However, the initial dataset 
usually contains a high percentage of spurious 
hits (false positives) and misses most of the real 
binders (false negatives). Thus, it is imperative to 
confirm the results by other cell-based protein- 
protein interaction (PPI) assays to remove false 
positives. Experimental confirmation of all hits, 
similarly to in silico generated lists, may still be a 
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demanding task requiring the testing of hundreds 
of motifs. 

Sometimes, special methods need to be 
developed for motif detection and/or validation. 
This problem was also encountered in our labora- 
tory, during the proteome-wide discovery of 
mitogen-activated protein kinase (MAPKs) 
recruiting motifs [5]. Namely, we found that PPI 
techniques applying washing steps are unsuitable 
for the detection of these typically low affinity 
(1-20 uM) interactions. These motifs (including a 
benchmarking set of well-known true positives) 
also consistently gave negative results in phage 
display. Thus, the kinase recruiting motifs could 
only be reliably identified with a custom assay, 
detecting the phosphorylation enhancement for 
an artificial substrate. Similar enzymatic 
labeling-based methods have since been effec- 
tively used both in vitro (e.g., chemical modifica- 
tion of peptide arrays by lysine deacetylase 
fused proteins) and cells (e.g., biotinylation of 
partners by biotinyl transferase fused proteins) 
[33, 34]. Enzymatic modification of binding 
partners by a “perpetrator” protein in the custom 
assay can thus often be used to explore or confirm 
protein-protein interactions. The drawback is that 
such interactions might not always be direct. 

For highly transient interactions, in principle, 
crosslinkers might also be used to “freeze” short- 
lived interactions [35]. Crosslinking between 
proteins can even be controlled by a photochemi- 
cal process and then mapped to surface sites with 
high precision using mass spectrometry [36]. How- 
ever, crosslinking can also be highly unspecific. A 
considerably milder version of trapping weak and 
transient interactions is implemented in protein- 
fragment complementation assays (PCA), where 
the bait and its prey are covalently bound to the 
fragments of a third (reporter) protein [37]. Upon 
interaction, the reporter protein assembles and 
recovers its Own enzymatic activity or fluores- 
cence. When non-functional fragments of fluores- 
cent proteins (e.g., GFP, YFP) are reconstituted 
after interaction, the method is called bimolecular 
fluorescence complementation assay (BiFC). 
BiFC has the advantage that it can also help to 
visualize the place of the interaction (nucleus, 
cytoplasm, cell membrane, or cytoskeleton) by 
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fluorescence microscopy. Unfortunately, for 
weaker interactions (>10 uM) or low protein 
expression levels, the signal-to-noise ratio of 
BiFC is often poor, due to the autofluorescence 
of cells and standard cell culture media. In this case 
a split luciferase complementation assay tends 
to give better results, as its enzymatic activity 
multiplies the signal [37]. Moreover, cells without 
active luciferase do not give any background 
luminescence. 

For a good PCA experiment the following 
details need to be considered: 


1. Choose an appropriate reporter system 
(enzyme or fluorescent protein) and reporter 
protein (Fig. 5.3a). For example, in the case of 
a simple split luciferase complementation 
assay, several differently split constructs are 
available, with variable stability of the com- 
plex based on fragment lengths [38]. 

2. Try to place reporter fragments optimally 
(to the N-terminus or C-terminus of the protein 
of interest, POI). At least eight different fusion 
proteins are possible for one interacting pair of 
proteins [38, 39]. The best practice is to test 
every possibility; however, there are ways to 
predict which combinations are going to be 
highly functional: 

(a) If the 3D structure of a POI is known, 
then the optimal orientation of the 
complementing fragments can be 
guessed at. 

The reporter fragments need to be as 

close to the interaction site as possible 

(in both proteins), otherwise the signal 

will be low. 

(c) Fusing a reporter fragment to the 

N-terminus might enhance the expression 

of our POI, so it is advantageous to choose 

an N-terminal fragment for a POI with low 
expression, while a C-terminal fragment 
for a POI with high expression. 

The creation of the fusion protein must 

not significantly alter the localization, sta- 

bility, and biological function of POIs. 

3. Determine the length and sequence of the 
linker between the protein and its reporter 


(b) 


(d) 
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Fig. 5.3 Cellular PCA based techniques in practice. (a) 
Interactions of p38a MAP kinase with docking motif 
(D-motif) containing partners were examined with 
N-terminal YFP fragment fusions in a fluorescent protein 
based PCA experiment. (b) Fluorescence microscopy 
already shows a profound difference between wild-type 


fragment. The linkers should be 
conformationally neutral, hydrophilic, and 
flexible enough to ensure the proper fit of 
fusion partners upon protein-protein complex 
formation and reconstitution of the reporter. 
Normally, a (GGSGGS), sequence with 
n = 1-3 is sufficient. 

4. Choosing the right negative controls is possi- 
bly the most important aspect of PCA based 
validation experiments. This is because fluo- 
rescent protein fragments can associate and 
give false positive signal in the absence of 
bait-prey interaction, particularly under high 
expression levels. (Sometimes reporter 
fragments without the POIs give higher signal 


67 


#) 


p38a + 


Fluorescence intensity (arbitrary units) 


wt Amotif 
+ MKK6 


wt Amotif 
+ AAKG2 


AAKG2 
MKK6 


western blot (anti-FLAG) 
p380 ee 


western blot (anti-p38a) 


and mutant proteins lacking the linear motif. 
(c) Differences in fluorescence are quantifiable in 
experiments if all fragments are expressed at a similar 
extent, as shown by the Western-blots of samples derived 
from HEK293 cells transiently transfected with the YFP 
fragments [5] 


than the fusion protein, since reporter 
fragments have higher expression level in the 
cell than the fusion construct containing a 
large POI.) Therefore, it is obligatory to 
include controls to ensure that the enzymatic 
activity or fluorescence is not due to unspecific 
effects. The best control is the same globular 
POI with its linear motif binding surface 
mutated. Alternatively, key residues of the 
motif can also be mutated in the linear motif 
containing partner, or the entire motif can be 
removed (Fig. 5.3b). 

5. Test the expression levels of the fusion proteins 
(in all cases). This is especially important when 
using the PCA signal to estimate relative 
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interaction strengths of different preys binding 
to the same bait. However, one should also be 
aware that interactions also depend on the local- 
ization and expression of POI and all these 
affect the reconstitution of the reporter. Includ- 
ing the same epitope tag (e.g., FLAG, V5) into 
all the expression constructs can help to quan- 
tify protein expression levels (Fig. 5.3c). 

6. A major drawback of PCA based methods is 
that they are not well-suited to observe PPI 
dynamics, as most fusion tags permanently 
hold bait and prey together once the reporter 
is folded (which is nevertheless a great advan- 
tage to detect weak and transient interactions). 
Recently, however, fully reversible luciferase 
complementation assays have also been devel- 
oped. The latter can be used to detect the PPI 
dynamics of a cellular signaling relevant tran- 
sient interaction, for example, enabling com- 
plex formation measurements under basal and 
stimulated conditions [40]. 


Linear Motif Mediated PPI 
Dynamics 


5.5 


From a functional perspective, it is important to 
note that linear binding motif mediated PPIs are 
often affected by post-translational modifications 
[6, 41]. There are several ways protein phosphor- 
ylation mechanistically affects PPI dynamics. The 
best-known case is when phosphorylation creates 
a binding site for a globular domain (e.g., SH2 
domains for tyrosine-containing linear motifs). 
This can be referred to as a clear ON switch. 
However, if the affinity constants change only to 
a lesser degree (less than about an order of mag- 
nitude), then a positive phosphoswitch may be 
regarded as an ON dimmer, where phosphoryla- 
tion has a more graded effect on linear motif 
mediated binding. Conversely, there are many 
examples for OFF switches (loss of binding due 
to full steric clash) or OFF dimmers (decrease of 
affinity), where phosphorylation interferes with 
protein-peptide type interactions (Fig. 5.4a). 

De novo identification of linear motifs, 
whose activity strictly depends on covalent 
modifications (canonical ON switches) often 
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poses a great technical difficulty. These 
interactions are unlikely to form in heterologous 
biological systems (e.g., the “usual” Y2H assays 
or phage displays) and are invisible to peptide 
arrays unless they were synthesized with 
modified amino acids. To circumvent such 
challenges, specialized screens are necessary. 
Introduction of mammalian tyrosine kinases into 
yeast strains (phospho-tyrosine Y2H) or direct 
phosphorylation of phage surface epitopes with 
kinase enzymes prior to their selection are just a 
few solutions that can be applied for this difficult 
scenario [42, 43]. For in screens, 
phosphorylated linear motifs can be generated 
conventionally, using either specific kinases or 
chemical synthesis; but certain modified amino 
acids (e.g., phospho-Ser) might also be 
incorporated as an unnatural amino acid during 
translation of the bait [44]. 

In cells, the dynamics of linear motif mediated 
protein complex formation may also be followed 
by conventional methods. In our example, the 
dissociation of ribosomal S6 subunit kinase 
1 (RSK1) from its PDZ domain-containing scaf- 
fold protein (membrane associated guanylate 
kinase inverted 1, MAGII) upon epidermal 
growth factor (EGF) treatment was monitored 
by a luciferase based PCA experiment. EGF treat- 
ment results in the activation of RSK1. Activated 
RSK1 then autophosphorylates itself in its 
C-terminal region at Thr733, inside its PDZ bind- 
ing linear motif. This phosphorylation event acts 
as an OFF switch, as it sterically disrupts the 
complex (Fig. 5.4b). RSK1-MAGI-1 association 
dips at maximal RSK1 activity and dephosphory- 
lation at this site by phosphatases allows 
re-assembly of the complex [7]. 


vitro 


5.6 Conclusion 

In recent years, the robustness and reliability of 
interactomics discovery tools have improved dra- 
matically. Despite this evolution, there is no 
one-size-fits-all experimental method for all 
problems. The golden rule is thus to combine 
different methods (including generic and specific 
assays) to obtain the most reliable results 
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Fig. 5.4 Linear motif-based switches modulated by phos- 
phorylation. (a) Classification of phosphoswitches. Phos- 
phorylation might have a profound effect on protein- 
peptide type interactions, either by promoting it 
(ON switch), disrupting it (OFF switch) or by altering 
the binding affinity in a more gradual fashion (ON or 
OFF dimer). (b) As an example of dynamic regulation, 
the formation of RSK1-MAGII complex is controlled by 


possible. The same is true for purely in silico 
methods: they work best when coupled with 
experimental testing, and a sequential approach 
can be very powerful. Thus, one can use 
predictions to obtain a putative interactor set, 
which would then be tested experimentally. The 
results can be utilized to improve predictions, and 
the process can be repeated until it converges to a 
set of well-defined linear motifs. 

There are possibly millions of PPIs in the cell 
that orchestrate cellular events at the molecular 
level. How can we obtain a biological understand- 
ing of these complex PPI networks? What is the 
role of a given linear binding motif in cellular 
regulation? Linear motif discovery tools clearly 
highlighted the abundance of SLiMs in the 
human proteome. These motifs seem to be partic- 
ularly abundant in regulatory systems, such as in 
proteins involved in cellular signaling, in the con- 
trol of protein degradation and gene expression. 
Mutations of linear motifs are also seen in diseases 
such as cancer [45, 46]. At the same time, domain 
surfaces accommodating SLiMs are attractive 
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PDZ domain binding and PDZ ligand phosphorylation 
(on Thr733). The crystal structure of the RSK1 
C-terminus (green) in complex with the MAGI] PDZ 
domain (brown) is shown on the upper panel. The lower 
panel shows the results of a luciferase complementation 
assay, where the complex formation between RSK1 and 
MAGII was monitored in unstimulated (black) and EGF 
stimulated (gray) HEK293 cells [7] 


targets for novel pharmaceutical agents [47]. How- 
ever, before SLiMs and their partners could be 
utilized for biotechnology, medical diagnostics or 
therapy, their behavior needs to be untangled first. 
In many proteins, linear motifs form complicated 
networks, where secondary modifications might 
trigger each other in a sequential order 
[48, 49]. By now it has become well-established 
that in dynamic regulatory networks — particularly 
involved in cellular signaling, protein level, and 
gene expression regulation — phosphoswitches, 
phosphodegrons, or phosphorylation-dependent 
SUMOrylation sites are used as simple building 
blocks. Considering the great progress made in 
the field in the last two decades, a more complete 
exploration of linear motif-controlled protein-pro- 
tein associations within these networks appears to 
be a tractable challenge for scientists. 


Acknowledgements This work was supported by grants 
from the National Research, Development and Innovation 
Office, Hungary (OTKA NN 114309, OTKA PD120973 
and KKP_17 126963). 


70 


References 


= 


10. 


11. 


12. 


13. 


14. 


. Tompa P, Davey NE, Gibson TJ, Babu MM (2014) A 


million peptide motifs for the molecular biologist. Mol 
Cell 55:161—169. https://doi.org/10.1016/j.molcel. 
2014.05.032 


. Van Roey K, Uyar B, Weatheritt RJ et al (2014) Short 


linear motifs: ubiquitous and functionally diverse pro- 
tein interaction modules directing cell regulation. 
Chem Rev 114:6733—6778. https://doi.org/10.1021/ 
cr400585q 


. Mészáros B, Simon I, Dosztányi Z (2009) Prediction 


of protein binding regions in disordered proteins. 
PLoS Comput Biol 5:e1000376. https://doi.org/10. 
1371/journal.pcbi. 1000376 


. Gouw M, Michael S, Sámano-Sánchez H et al (2018) 


The eukaryotic linear motif resource — 2018 update. 
Nucleic Acids Res 46:D428—D434. https://doi.org/10. 
1093/nar/gkx 1077 


. Zeke A, Bastys T, Alexa A et al (2015) Systematic 


discovery of linear binding motifs targeting an ancient 
protein interaction surface on MAP kinases. Mol Syst 
Biol 11:837 


. Van Roey K, Dinkel H, Weatheritt RJ et al (2013) The 


switches.ELM resource: a compendium of conditional 
regulatory interaction interfaces. Sci Signal 6:rs7. 
https://doi.org/10.1126/scisignal.2003345 


. Gógl G, Biri-Kovacs B, Poti AL et al (2018) Dynamic 


control of RSK complexes by phosphoswitch-based 
regulation. FEBS J 285:46—71. https://doi.org/10. 
1111/febs.14311 


. Vidal M, Fields S (2014) The yeast two-hybrid assay: 


still finding connections after 25 years. Nat Methods 
11:1203—-1206 


. Dunham WH, Mullin M, Gingras A-C (2012) Affinity- 


purification coupled to mass spectrometry: basic 
principles and strategies. Proteomics 12:1576—1590. 
https://doi.org/10.1002/pmic.201 100523 

Bonetta L (2010) Interactome under construction. 
Nature 468:85 1-852. https://doi.org/10.1038/46885 1a 
Blikstad C, Ivarsson Y (2015) High-throughput 
methods for identification of protein-protein 
interactions involving short linear motifs. Cell 
Commun Signal 13:38. https://doi.org/10.1186/ 
s12964-015-0116-8 

Volkmer R, Tapia V, Landgraf C (2012) Synthetic 
peptide arrays for investigating protein interaction 
domains. FEBS Lett 586:2780-2786. https://doi.org/ 
10.1016/.febslet.2012.04.028 

Davey NE, Seo M-H, Yadav VK et al (2017) Discov- 
ery of short linear motif-mediated interactions through 
phage display of intrinsically disordered regions of the 
human proteome. FEBS J 284:485—498. https://doi. 
org/10.1111/febs.13995 

Larman HB, Zhao Z, Laserson U et al (2011) 
Autoantigen discovery with a synthetic human 
peptidome. Nat Biotechnol 29:535-541. https://doi. 
org/10.1038/nbt.1856 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. Davey NE, Shields 


A. Zeke et al. 


Ivarsson Y, Arnold R, McLaughlin M et al (2014) 
Large-scale interaction profiling of PDZ domains 
through proteomic peptide-phage display using 
human and viral phage peptidomes. Proc Natl Acad 
Sci 111:2542-2547. https://doi.org/10.1073/pnas. 
1312296111 

Karimova G, Gauliard E, Davi M et al (2017) Protein— 
protein interaction: bacterial two-hybrid. In: Methods 
in molecular biology. Humana Press, Clifton, pp 
159-176 

Riegel E, Heimbucher T, Höfer T, Czerny T (2017) A 
sensitive, semi-quantitative mammalian two-hybrid 
assay. BioTechniques 62:206-214. https://doi.org/10. 
2144/0001 14544 

Cherf GM, Cochran JR (2015) Applications of yeast 
surface display for protein engineering. Methods Mol 
Biol 1319:155-175. https://doi.org/10.1007/978-1- 
4939-2748-7_8 

Michnick SW, Ear PH, Landry C et al (2011) Protein- 
fragment complementation assays for large-scale anal- 
ysis, functional dissection and dynamic studies of 
protein-protein interactions in living cells. In: 
Methods in molecular biology. Humana Press, Clifton, 
pp 395—425 

Cabantous S, Nguyen HB, Pedelacq J-D et al (2013) A 
new protein-protein interaction sensor based on tripar- 
tite split-GFP Association. Sci Rep 3:2854. https://doi. 
org/10.1038/srep02854 

To T-L, Zhang Q, Shu X (2016) Structure-guided 
design of a reversible fluorogenic reporter of protein- 
protein interactions. Protein Sci 25:748—753. https:// 
doi.org/10.1002/pro.2866 

Lemmens I, Lievens S, Tavernier J (2015) MAPPIT, a 
mammalian two-hybrid method for in-cell detection of 
protein-protein interactions. Methods Mol Biol 1278: 
447-455. https://doi.org/10.1007/978-1-4939-2425- 
7_29 

Gibson TJ, Dinkel H, Van Roey K, Diella F (2015) 
Experimental detection of short regulatory motifs in 
eukaryotic proteins: tips for good practice as well as 
for bad. Cell Commun Signal 13:42. https://doi.org/10. 
1186/s12964-015-0121-y 

Edwards RJ, Palopoli N (2015) Computational predic- 
tion of short linear motifs from protein sequences. In: 
Methods in molecular biology. Humana Press, Clifton, 
pp 89-141 

de Castro E, Sigrist CJA, Gattiker A et al (2006) 
ScanProsite: detection of PROSITE signature matches 
and ProRule-associated functional and structural 
residues in proteins. Nucleic Acids Res 34:W362— 
W365. https://doi.org/10.1093/nar/gk1124 
Krystkowiak I, Davey NE (2017) SLiMSearch: a 
framework for proteome-wide discovery and annota- 
tion of functional modules in intrinsically disordered 
regions. Nucleic Acids Res 45:W464—W469. https:// 
doi.org/10.1093/nar/gkx238 

DC, Edwards RJ (2006) 
SLiMDisc: short, linear motif discovery, correcting 


Linear Peptide Motifs 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37; 


38. 


for common evolutionary descent. Nucleic Acids Res 
34:3546-3554. https://doi.org/10.1093/nar/gk1486 
Bailey TL, Boden M, Buske FA et al (2009) MEME 
SUITE: tools for motif discovery and searching. 
Nucleic Acids Res 37:W202-W208. https://doi.org/ 
10.1093/nar/gkp335 

Haslam NJ, Shields DC (2012) Profile-based short 
linear protein motif discovery. BMC Bioinform 13: 
104. https://doi.org/10.1186/1471-2105-13-104 
Prytuliak R, Volkmer M, Meier M, Habermann BH 
(2017) HH-MOTIF: de novo detection of short linear 
motifs in proteins by hidden Markov model 
comparisons. Nucleic Acids Res 45:10921—10921. 
https://doi.org/10.1093/nar/gkx810 

Hertz EPT, Kruse T, Davey NE et al (2016) A 
conserved motif provides binding specificity to the 
PP2A-B56 phosphatase. Mol Cell 63:686—695. 
https://doi.org/10.1016/j.molcel.2016.06.024 

Sanchez IE, Beltrao P, Stricher F et al (2008) Genome- 
wide prediction of SH2 domain targets using structural 
information and the FoldX algorithm. PLoS Comput 
Biol 4:e1000052. https://doi.org/10.1371/journal.pcbi. 
1000052 

O’Kane PT, Mrksich M (2017) An assay based on 
SAMDI mass spectrometry for profiling protein inter- 
action domains. J Am Chem Soc 139:10320-10327. 
https://doi.org/10.1021/jacs.7b03805 

Beck DB, Narendra V, Drury WJ et al (2014) In vivo 
proximity labeling for the detection of protein-protein 
and protein-RNA interactions. J Proteome Res 13: 
6135-6143. https://doi.org/10.1021/pr500196b 

Yang B, Tang S, Ma C et al (2017) Spontaneous and 
specific chemical cross-linking in live cells to capture 
and identify protein interactions. Nat Commun 8:2240. 
https://doi.org/10.1038/s41467-017-02409-z 

Pham ND, Parker RB, Kohler JJ (2013) 
Photocrosslinking approaches to  interactome 
mapping. Curr Opin Chem Biol 17:90—101. https:// 
doi.org/10.1016/j.cbpa.2012.10.034 

Morell M, Ventura S, Avilés FX (2009) Protein com- 
plementation assays: approaches for the in vivo analy- 
sis of protein interactions. FEBS Lett 583:1684—-1691. 
https://doi.org/10.1016/j.febslet.2009.03.002 

Kudla J, Bock R (2016) Lighting the way to protein- 
protein interactions: recommendations on best 
practices for bimolecular fluorescence complementa- 
tion analyses. Plant Cell 28:1002—1008. https://doi. 
org/10.1105/tpc. 16.00043 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


71 


Kerppola TK (2006) Design and implementation of 
bimolecular fluorescence complementation (BiFC) 
assays for the visualization of protein interactions in 
living cells. Nat Protoc 1:1278—1286. https://doi.org/ 
10.1038/nprot.2006.201 

Dixon AS, Schwinn MK, Hall MP et al (2016) 
NanoLuc complementation reporter optimized for 
accurate measurement of protein interactions in cells. 
ACS Chem Biol 11:400—408. https://doi.org/10.1021/ 
acschembio.5b00753 

Wu C-G, Chen H, Guo F et al (2017) PP2A-B’ holo- 
enzyme substrate recognition, regulation and role in 
cytokinesis. Cell Discov 3:17027. https://doi.org/10. 
1038/celldisc.2017.27 

Grossmann A, Benlasfer N, Birth P et al (2015) 
Phospho-tyrosine dependent protein-protein interac- 
tion network. Mol Syst Biol 11:794 

Shah NH, Löbel M, Weiss A, Kuriyan J (2018) Fine- 
tuning of substrate preferences of the Src-family 
kinase Lck revealed through a high-throughput speci- 
ficity screen. elife 7. https://doi.org/10.7554/eLife. 
35190 

Rogerson DT, Sachdeva A, Wang K et al (2015) 
Efficient genetic encoding of phosphoserine and its 
nonhydrolyzable analog. Nat Chem Biol 11:496-503. 
https://doi.org/10.1038/nchembio. 1823 

Uyar B, Weatheritt RJ, Dinkel H et al (2014) 
Proteome-wide analysis of human disease mutations 
in short linear motifs: neglected players in cancer? Mol 
BioSyst 10:2626—2642. https://doi.org/10.1039/ 
C4MB00290C 

Mészaros B, Zeke A, Reményi A et al (2016) System- 
atic analysis of somatic mutations driving cancer: 
uncovering functional protein regions in disease devel- 
opment. Biol Direct 11:23. https://doi.org/10.1186/ 
$13062-016-0125-6 

Corbi-Verge C, Kim PM (2016) Motif mediated 
protein-protein interactions as drug targets. Cell 
Commun Signal 14:8. https://doi.org/10.1186/ 
$12964-016-0131-4 

Yu S, Wang F, Tan X et al (2018) FBW7 targets 
KLF10 for ubiquitin-dependent degradation. Biochem 
Biophys Res Commun 495:2092—2097. https://doi. 
org/10.1016/j.bbre.2017.11.187 

Hietakangas V, Anckar J, Blomster HA et al (2006) 
PDSM, a motif for phosphorylation-dependent SUMO 
modification. Proc Natl Acad Sci 103:45—50. https:// 
doi.org/10.1073/pnas.0503698 102 


| D 
Check for 
| updates 


Jorge Santos-Lopez, Sara Gómez, Francisco J. Fernandez ©, 


and M. Cristina Vega 


Abstract 


The specific kinetics and thermodynamics 
of protein-protein interactions underlie the 
molecular mechanisms of cellular functions; 
hence the characterization of these interaction 
parameters is central to the quantitative under- 
standing of physiological and pathological 
processes. Many methods have been devel- 
oped to study protein-protein interactions, 
which differ in various features including the 
interaction detection principle, the sensitivity, 
whether the method operates in vivo, in vitro, 
or in silico, the temperature control, the use of 
labels, immobilization, the amount of sample 
required, the number of measurements that can 
be accomplished simultaneously, or the cost. 
Bio-Layer Interferometry (BLI) is a label-free 
biophysical method to measure the kinetics 
of protein-protein interactions. Label-free 
interaction assays are a broad family of 
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methods that do not require protein 
modifications (other than immobilization) 
or labels such as fusions with fluorescent 
proteins or transactivating domains or chemi- 
cal modifications like biotinylation or reaction 
with radionuclides. Besides BLI, other 
label-free techniques that are widely used 
for determining protein-protein interactions 
include surface plasmon resonance (SPR), 
thermophoresis, and isothermal titration calo- 
rimetry (ITC), among others. 


Keywords 


Biolayer interferometry - Label-free 
techniques - Protein-protein interactions - 
Binding kinetics - Complement system - C5a 
anaphylatoxin - Antibody - Streptavidin 
biosensor 


6.1 Biolayer Interferometry 

BLI is an in-vitro optical technique that can reg- 
ister molecular association and disassociation 
kinetics onto the tip of a biosensor by white 
light interferometry [1—4]. In detail, the binding 
and detaching of a protein analyte is monitored by 
measuring changes in the interference pattern 
from white light (obtained from a tungsten 
lamp) reflected by two layers at the 
biosensor tip: the surface of the biosensor tip 
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where molecules attach (variable) and an internal 
reference layer (constant). Any changes in the 
thickness of the molecular layer on the surface 
of the sensor tip, by molecular coupling or 
disengaging, is associated to a variation in the 
optical path length of this layer and so it causes 
a shift in the interference pattern of the reflected 
white light from the two surfaces. This modifica- 
tion of the reflected light can be measured in real- 
time by a detector and is reported as a wavelength 
shift (AX) in nm, which is the magnitude of 
binding response plotted versus time in BLI 
sensorgrams (Fig. 6.1). Importantly, BLI signal 
is sensitive to molecules that associate or dissoci- 
ate directly to the biosensor tip surface or to other 
molecules previously bound to this surface. In 
accordance with this, qualitative and quantitative 
information about molecular interactions and 
sample concentration can be obtained in BLI 
experiments. 

Several BLI platforms have been developed 
that display slightly different specifications and 
capabilities (Fig. 6.2) [3-5]. Overall, low sample 
quantities are required for an assay, from about 
200 uL down to only 4 uL. Moreover, samples 
can be recovered after the assay and used in 
further experiments. High sensitivity can be 
reached by these platforms, with a lower limit 
for the molecular size that can be detected that 
varies from 150 Da to 10,000 Da, and an affinity 
range from 0.1 mM to 10 pM (Octet systems) or 
from | mM to 0.1 nM (BLItz system). Regarding 
the number of channels, BLI platforms offer flex- 
ibility. While some platforms can only read one 
or two samples at the same time using one or 
two channels, some Octet systems have 8 or 
16 channels, thereby enabling high-throughput 
approaches. 

BLI biosensors are disposable fiber-optic 
devices whose tips are made from a biocompati- 
ble matrix that is uniform, non-denaturing, 
and minimizes non-specific binding, and are 
conveniently derivatized to be able to bind 
biomolecules such as proteins or nucleic acids 
covalently or non-covalently. An important req- 
uisite for BLI assays is preserving the molecular 
activity of the biomolecules while attached to the 
biosensor. Multiple commercial biosensors are 
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available according to the tip derivatization, bind- 
ing features, and intended or potential use. These 
include biosensors coated with specific 
antibodies, Ni-NTA, streptavidin, or proteins A, 
G, or L, as well as some amino-reactive 
biosensors [3, 4]. 

Generally, BLI biosensors are divided into 
three categories depending on their application: 
binding assays, quantitation assays, or both. The 
suitable application for a given biosensor is 
established by several sensor properties: molecu- 
lar binding specificity, stability, and affinity; 
assay reproducibility; and the quantitation 
dynamic range. Although biosensor disposal is 
recommended after one use, some biosensors 
may be regenerated by removing the molecules 
captured on their tips. Regeneration procedures 
aim to detach the highest quantity possible of the 
attached molecules while preserving the quality 
and the proper function of the biosensor tip (bio- 
compatible matrix, its derivatization, and, if any, 
other captured molecules that should remain 
retained). The number of use-regeneration cycles 
compatible with a biosensor depends on the bio- 
sensor type and the application, and should be 
determined experimentally [3, 4]. 

BLI assays are carried out employing a dip-and- 
read approach, in which the tip of a biosensor is 
sequentially submerged in solutions with or with- 
out the molecules assayed for binding. The solu- 
tion support is in constant rotation to avoid the 
formation of concentration gradients and to prevent 
rebinding events [6]. The interactions analyzed in 
BLI experiments can involve different types of 
biomolecules, including proteins, nucleic acids, 
lipids, peptides, and small molecules. 

According to the type of information obtained, 
two types of BLI experiments can be 
distinguished [1, 4]: binding and quantitation 
experiments. 


6.1.1 Binding Experiments 

In binding experiments, qualitative and quantita- 
tive information about kinetic and equilibrium 
properties of molecular interactions is sought. 
Typically, the interaction studied occurs between 
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Fig. 6.1 Principle of the BLI assay. (a) The sensor tip 
coated with an interacting partner (ligand) is sequentially 
dipped into assay buffer, a solution of the interacting 
partner (analyte), and assay buffer in the baseline, associa- 
tion and dissociation phases, respectively. (b) Throughout 
the assay, the interference pattern of reflected light from 
the reference layer and the molecular layer onto the bio- 
compatible surface is continuously monitored. During the 


two molecules that are sequentially exposed to 
the biosensor. The first interacting molecule, 
referred to as ligand or bait, is immobilized onto 
the biosensor tip surface with high affinity. Next, 
binding to and detaching from the sensor tip of 
the other molecule of the pair, referred to as the 
analyte, is monitored. A set of association and 
dissociation curves (sensorgrams) is acquired by 
repeating the experiment with different analyte 
concentrations. From this set of sensorgrams, the 


Time 


association and dissociation phases, the molecular layer’s 
thickness changes. These changes are tracked by measur- 
ing shifts in the interference pattern of the reflected light 
compared to the pattern observed during the baseline 
phase (denoted as AA). (ce) The binding curve for the BLI 
assay is derived by plotting the wavelength shift in 
reflected light against time 


association rate constant (k,) and the dissociation 
rate constant (kg) of the interaction can be 
obtained by fitting the data with an adequate 
binding model, and then the binding affinity 
constants (Kp and K4) can be calculated [7—9- 
]. Alternatively to the standard kinetic strategy, 
approaches have been developed to derive the 
equilibrium affinity constants without measuring 
the kinetic parameters. These approaches can be 
helpful to determine equilibrium constants of 
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BLItz system 


Fig. 6.2. Two commercially available BLI platforms. 
(a) FortéBio BLItz system. Photographs taken by 
Dr. Sergio Navas-Yuste (CIB-CSIC). (b) FortéBio Octet 


interactions with complex kinetics or that 
are disrupted by the immobilization step 
[7, 10]. Using either kinetic or equilibrium 
strategies, BLI binding experiments have been 
applied to elucidate the structural basis of macro- 
molecular interactions and to the identification, 
comparison, and validation of binders and inter- 
action modulators targeting specific proteins and 
protein complexes (see examples in Sect. 6.3). 

Examples of biosensors suitable for 
binding assays include streptavidin (SA), 
aminopropylsilane (APS), amine reactive 2nd 
generation (AR2G), or anti-human Fc capture 
sensors [11]. 


R8 system. Images were kindly provided by Dr. Alberto 
Marina (IBV-CSIC) 


6.1.2 Quantitation Experiments 

In this type of experiments, we seek to determine 
the concentration of a macromolecule in a sam- 
ple. These assays require a BLI experimental 
design sensitive to a molecular interaction involv- 
ing the molecule to quantify (analyte) and a stan- 
dard solution series of the analyte. For example, 
the analyte could be known to bind the unmodi- 
fied surface of a certain biosensor or a specific 
ligand susceptible to be captured on the 
biosensor’s tip, in a dose-dependent (quantitative) 
fashion. Once a suitable experimental design is 
selected, BLI binding curves are measured for all 
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available analyte concentrations. From these 
binding curves, a standard curve is generated by 
plotting the BLI signal at a specific time point or 
the initial association rate against the known 
analyte concentration; to determine the analyte 
concentration in the unknown samples, only 
the association kinetic phase is necessary. Next, 
the analyte concentrations of the unknown 
samples are determined by interpolating their 
corresponding measurements in the standard 
curve previously prepared [12—16]. Remarkably, 
the high specificity of the technique allows to use 
samples that are not completely purified and dif- 
ferentiate between active and inactive proteins. 
Anti-human IgG Fc, anti-murine IgG Fc, or pro- 
tein A are examples of some of the biosensors 
available for quantitative purposes [17]. 


6.1.3 BLI in Comparison with Other 


Techniques 


Compared to other techniques for the study of 
molecular interactions and quantitation [6, 18, 
19], BLI displays some remarkable features. 

BLI can be used to probe macromolecular 
interactions in rather crude or particulate samples, 
such as cell lysates, serum, periplasmic extracts, 
or solutions containing viral-like particles; this is 
possible because the BLI signal hardly depends 
on unbound molecules or the refractive index 
[20-23]. Besides, the dip-and-read approach 
facilitates these measurements because it does 
not require microfluidics or other costly instru- 
mentation. Surface plasmon resonance (SPR), 
another label-free technique, records changes in 
the refractive index as the analyte binds the 
immobilized ligand and it does depend on 
microfluidics [1]; therefore, the SPR instrument 
can easily become clogged with crude samples. 

The simple dip-and-read method characteristic 
of BLI measurements has been easily parallelized 
for high-throughput applications, thereby 
accelerating the collection of binding data faster 
and more efficiently than other techniques [8]. 

Compared to equilibrium binding assays 
like ELISA, BLI has the advantage of directly 
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measuring the association and dissociation kinetic 
constants, thus affording a deeper understanding 
of the interaction. The kinetic dissociation con- 
stant, ky, is regarded as one the most important 
parameter to assess the efficacy of a therapeutic. 
Interactions with similar affinity can exhibit 
markedly different association and dissociation 
rates, therefore experimentally determining the 
kinetic constants provides more insight than the 
Kp value [24]. While the initial investment in BLI 
instruments and biosensors can be substantial, the 
overall maintenance costs for BLI platforms are 
relatively low, rendering them cost-effective in 
the long run [25, 26]. Furthermore, development 
of BLI assays is straightforward, and comprehen- 
sive kinetic data can be rapidly acquired. 

BLI presents several limitations that merit 
attention. A significant concern is that the binding 
event must take place on the biocompatible sur- 
face of the sensor, where one of the interaction 
partners is immobilized. As a result, it is impera- 
tive to validate that the observed interaction 
accurately reflects native binding events and is 
not affected by factors such as mass transfer 
limitations, preferential orientations, or confor- 
mational changes [27]. 

Another issue arises from analyte rebinding 
during the dissociation phase, an undesirable 
event exacerbated by the absence of a flow system 
to remove dissociated analyte. This phenomenon 
distorts the dissociation curve and complicates 
kinetic analysis, especially for slow dissociating 
interactions [8]. To mitigate this artifact, 
researchers have used “sink strategies” by 
introducing a specific competitor for the analyte 
in the dissociation buffer, thereby preventing the 
analyte from rebinding the ligand [7]. 

A problem with the basic BLI platforms (e.g., 
BLItz) is lack of temperature control, which is an 
important drawback compared to the SPR 
instruments and the more advanced BLI systems 
such as the Octet RED96. The latter can control 
the assay temperature between 15 and 40 °C. 
Likewise, absence of humidity control in the 
BLI platforms may be an issue with sensitive 
samples or when the maximum assay time can 
cause substantial evaporation [8, 28, 29]. 
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With respect to the reproducibility of the 
results obtained by BLI, there are sources of 
variability that must be borne in mind. Firstly, 
the physical nature of the readout method and 
features of the platform and biosensor impose 
limitations in sensitivity for some applications; 
for example, limited lower-size sensitivity may 
restrict binding studies with small molecules 
[7]. However, this lower-size limit could be 
advantageous for the study of effects of small 
allosteric modulators in the interaction between 
two bigger molecules [28]. Furthermore, certain 
experimental variation among biosensor batches 
can be observed and the binding parameters 
determined can show a limited correlation with 
those obtained by other techniques, such as SPR 
[8, 29]. 


6.2 BLI Assay Development 

Assay development for BLI starts with a basic 
experimental design that takes into consideration 
the purpose (kinetics, quantitation, or both), the 
proteins involved in the interaction (or the protein 
and the small molecule or the nucleic acid binding 
partner), and the choice of sensor. 

To choose which binding partner will be 
immobilized (the ligand) and which one will be 
in solution (the analyte), we use some or all of the 
following criteria [2, 3, 30]: 


1. A protein partner should be the ligand if it has 
a limited availability (as the sample expendi- 
ture is less for the ligand than the analyte), it is 
already tagged (and there are BLI sensors for 
the fusion/tag), it is the smallest of the two 
binding partners (as BLI signal increases with 
the size of the analyte), and it can be safely and 
stably immobilized on the sensor. 

2. A protein partner should be the analyte when it 
cannot be successfully purified or is available 
only in complex mixtures. 

3. Lastly, a ligand-analyte layout with the lowest 
interaction valency might be preferred as this 
simplifies data analysis. In other words, the 
partner with the highest valency should be 
immobilized, other factors being equal. 
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Several controls should be run when performing a 
BLI experiment [3]: 


1. Buffer compatibility: the BLI signal from a 
fresh sensor exposed to the assay buffer should 
be flat and stable (buffer control). 

2. Nonspecific binding of the analyte to the 
sensor tip: record the BLI signal from a fresh 
sensor (without ligand immobilized) exposed 
to the highest analyte concentration; the signal 
should be flat and constant, ideally zero (base- 
line) or close to zero. 

3. Nonspecific binding to the ligand: record the 
BLI signal from a sensor, with ligand 
immobilized, exposed to a solution with a 
known non binder (e.g., an isotype control 
antibody, a scrambled peptide or nucleic acid) 

4. Positive control: perform a complete assay 
with a known binder, thus ensuring that the 
ligand remains suitable for binding assays after 
immobilization. 

5. Buffer reference: data analysis requires 
subtracting the background signal (the blank) 
from all solution components lest the analyte. 
To obtain a blank curve, perform a complete 
assay using a solution without analyte. 


A simplified workflow for a BLI experiment 
contains the following steps, which have been 
described in greater detail elsewhere [3, 4, 26, 
28, 31]. 


6.2.1 Prepare Buffers, Regeneration 
Solution, Ligand, and Analyte 


Samples 


A suitable assay buffer guarantees the stability of 
both ligand and analyte and preserves the interac- 
tion. It should be compatible with the sensor’s 
biomatrix coating and capture molecules 
(if present). Buffer compatibility with the BLI 
signal should be confirmed experimentally, espe- 
cially if its composition is complex or unknown 
(e.g., cell lysates, biological fluids). Nonspecific 
binding to the sensor tip might contribute signifi- 
cantly to the binding curves of some experiments, 
particularly when working with weak interactions 
and high concentrations of analyte [32]. In those 
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cases, blocking agents can be added to reduce 
nonspecific binding; useful blocking agents are 
1-2% (w/v) bovine serum albumin (BSA), 
polyethylene glycol (PEG), nonfat milk, casein, 
gelatin, non-ionic detergents like Tween 20/Poly- 
sorbate 20 (PS20), and saccharides, especially 
sucrose [32, 33]. If detergents are used, they 
should be added at working concentrations 
below the critical micelle concentration. The 
compatibility with BLI is known for common 
biochemicals; e.g., dimethylsulfoxide (DMSO) 
is compatible [34]. 

Although the optical dip-and-read format 
allows the use of crude extracts and particulate 
material as the analyte in BLI experiments, it’s 
highly recommended that both ligand and analyte 
samples be as pure and free of aggregates as possi- 
ble. Furthermore, the ligand and the analyte should 
be dialyzed or dissolved in assay buffer and their 
concentration should be known accurately. 

Ligand concentration for sensor coating 
should be adjusted to prevent either unsaturated 
or saturated biosensors. Poorly saturated sensors 
might show too low BLI signals in response to 
incubation with the analyte. Conversely, sensors 
which have too high ligand concentrations can 
show anomalous BLI signals due to steric 
hindrance, binding surface hiding, weaker non- 
specific interactions at high enough analyte con- 
centration, or rebinding events at lower analyte 
concentration. To optimize ligand concentration, 
at least three different concentrations covering 
one order of magnitude (e.g., 5-50 g/mL) should 
be tested for the association and dissociation 
kinetics of the analyte at a fixed concentration, 
and the ligand concentration that yields an accept- 
able nonsaturated BLI signal and adequate kinet- 
ics should be chosen (Fig. 6.3a). If ligand binding 
is low or poorly reproducible, overnight incuba- 
tion of the sensors in ligand solutions at 4 °C can 
greatly increase immobilization results [33, 35]. 

The analyte concentration range should also 
be optimized experimentally. For quantitation 
assays, it’s critical to determine the sensor’s 
dynamic range for the analyte before obtaining a 
reliable standard curve. For kinetic experiments, 
at least four analyte dilutions must be assayed 
spanning a range of concentrations from 10-fold 


79 


above to 10-fold below the estimated Kp. To 
achieve this, it’s convenient to prepare a 
concentrated analyte stock and then make a serial 
dilution of the analyte with a dilution factor of 
2-3. However, the concentration range of analyte 
may be limited by analyte solubility or assay 
sensitivity; in those cases, the actual concentra- 
tion range of analyte can be adjusted based on 
practical considerations. 


6.2.2 Hydrate the Biosensor in Assay 
Buffer to Minimize Nonspecific 


Signal 


Biosensors should be hydrated for at least 10 min 
before starting an assay, and they can be stored in 
assay buffer overnight at 4 °C. After hydration, 
their tips must not be let dry out and should only 
be in contact with the assay solutions to maintain 
quality and functionality. 


6.2.3 Maintain a Stable Assay 


Temperature 


Allow sufficient time to equilibrate all solutions 
and biosensors to reach a stable temperature 
before starting the assay. This is particularly 
important for the BLI systems without tempera- 
ture control (e.g., BLItz). Samples can be 
maintained on ice up to the beginning of the 
assay. 


6.2.4 Immobilize the Ligand onto 
the Biosensor Surface 
Monitoring the ligand immobilization is 


recommended to determine the loading level and 
reproducibility, and the stability of the ligand 
attached onto the biosensor surface. The immobi- 
lization step starts recording the signal of a fresh 
sensor in assay buffer until a stable baseline is 
established. Next, the sensor tip is transferred to 
the ligand solution at the chosen concentration 
and the association kinetics is recorded until the 
desired signal level is reached. Then, the sensor 
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Fig. 6.3 Development of a BLI assay to characterize 
the interaction between the C5a anaphylatoxin and an 
antibody (Ab). The experiment was carried on a BLItz 
system at room temperature, immobilizing biotinylated 
CSa onto streptavidin (SA) biosensors, and using the Ab 
as analyte. The assay buffer consisted of 10 mM HEPES 
(pH 7.4), 150 mM NaCl, 0.34 mM EDTA, supplemented 
with 0.02% (w/v) PS20 as indicated. Compatibility of the 
buffer with the BLI assay was previously confirmed (data 
not shown). The steps of the immobilization and the 
ligand-analyte interaction were the same: a baseline step 
in assay buffer for 30 s, an association step in a C5a or Ab 
solution for 300 s, and a dissociation step in assay buffer 
for 300 s. (a) Optimization of C5a concentration for immo- 
bilization. Several concentrations of biotinylated C5a were 


tip is dipped back into assay buffer to remove 
nonspecifically bound ligand and monitor the sta- 
bility of the BLI signal (Fig. 6.3a). 


6.2.5 Measure Association 


and Dissociation Kinetics 


After ligand immobilization, the sensor tip is 
transferred once again to assay buffer and the 
BLI signal is recorded to establish the binding 
baseline. In cases where ligand can dissociate 
from the sensor tip, this step can be foregone. 
Next, the sensor tip is dipped into the analyte 
solution at the first assayed concentration, and 
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tested (top) and then the interaction kinetics with an Ab at 
a constant concentration (1000 nM) was monitored (bot- 
tom). PS20 was included in the standard assay buffer since 
the detergent decreased nonspecific binding. (b) Charac- 
terization of the interaction between CSa and an Ab. After 
CSa immobilization at 4500 nM (top), several sensorgrams 
were obtained using varying Ab concentrations (middle). 
Corrected data of Ab:CSa interaction were globally fitted 
to the 1:1 binding model integrated in the BLItz software, 
and the fitted curves (black) were plotted together with the 
sensorgrams (middle). The fitting kinetics parameters, 
with their standard error in brackets, and goodness-of-fit 
statistics are shown (bottom). In the sensorgrams, the 
vertical dashed lines separate distinct steps 


the association phase of the ligand:analyte inter- 
action can be recorded (Fig. 6.3b). Sufficient time 
should be allowed for the BLI signal to reach the 
steady-state plateau, at least for the analyte solu- 
tion at the highest concentration. Immediately 
afterward, the sensor tip is moved to assay buffer 
to record the dissociation phase of the interaction 
(Fig. 6.3b). Enough time should be allowed for 
30-50% of the analyte to dissociate from the 
sensor; if the interaction is very stable, the disso- 
ciation time may have to be extended to 
15-30 min. After the first sensorgram is fully 
recorded, this multi-step procedure must be 
repeated until all analyte concentrations have 
been assayed. 
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6.2.6 Regenerate Sensor Tip Surface 
(Optional) 


If the sensor tips are to be re-used, a suitable 
regeneration buffer has to be used to remove as 
much of the remaining bound analyte as possible. 
Commonly used regeneration buffer formulations 
include low-pH buffers, detergents (e.g., sodium 
dodecyl] sulfate), and high ionic strength buffers. 
Typically, several short-time exposures to the 
regeneration buffer are more successful than a 
single, longer incubation. As the harsh conditions 
employed for sensor regeneration may damage 
the ligand, it’s imperative to confirm the sensor’s 
state by dipping it into the highest analyte con- 
centration tested and compare binding capacity 
and kinetics to those previously recorded. Ideally, 
a successfully regenerated sensor should maintain 
>90% of the original binding capacity and equiv- 
alent binding kinetics. 


6.2.7 Data Processing, Analysis, 


and Fitting 


After recording all sensorgrams and before data 
analysis can start, two corrections are applied. 
Firstly, the blank curve at zero analyte (assay 
buffer only) is subtracted from each sensorgram 
to correct the background signal. Secondly, the 
start of the association and dissociation steps are 
aligned to the end of the immediately previous 
baseline or association phase, respectively. 

Once these corrections have been applied, the 
corrected sensorgrams are fitted using a mathe- 
matical binding model. Fitting can be applied to 
individual curves (local fitting) or to all curves 
simultaneously (global fitting). In general, it’s 
better to perform global fitting with the simplest 
possible binding model, considering the maximal 
binding signal as a variable, since the level 
of immobilized ligand can be different for 
each run. Sensorgrams, fitted curves, kinetic 
parameters, and fitting statistics must be exam- 
ined thoroughly, paying especial attention to the 
curves corresponding to low and high analyte 
concentrations, which can show weak BLI signal 
and aggregation, respectively. Besides visual 
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agreement between the experimental and fitting 
sensorgrams, several criteria have been proposed 
to evaluate the quality of fitting results: 
parameter-associated errors should not exceed 
10% of the parameter value; the fitting residuals 
should be confined inside the interval defined by 
+10% of the maximum response of the fitted 
curve; the goodness-of-fit ° statistic should be 
smaller than 3; and the R° statistic greater than 
95% [3]. 

Other recommendations to bear in mind to 
perform an optimal BLI experiment include: 


e Whenever possible, use the same assay buffer 
for sensor hydration, ligand, analyte, and base- 
line and dissociation steps; and use the same 
tube/well with assay buffer for baseline and 
the corresponding dissociation step. 

e Minimize nonspecific binding by optimizing 
ligand and analyte concentrations and the 
assay buffer composition. 

e Optimize step times to ensure fully 
equilibrated (flat) baselines and enough disso- 
ciation for reliable data fitting. 

e Do not reuse biosensors 
regenerating them. 

e Avoid foam and bubbles in all solutions since 
they may disturb the BLI signal. 


without 


6.3 Examples of Protein-Protein 


Interactions Studied by BLI 


Since its introduction, BLI has grown into an 
easy-to-perform, versatile, label-free method for 
the measurement of protein-protein interaction 
kinetics and affinities in a wide variety of relevant 
settings. Especially, in the field of structural biol- 
ogy, BLI has made important contributions since 
the strength of protein-protein interactions 
governs the stability of multisubunit complexes 
over the time frame of experiments. 

We provide below several examples of the 
application of BLI for the characterization of 
protein-protein interactions. 

Grela et al. used BLI to characterize the bind- 
ing kinetics of the interaction between the human 
ribosomal proteins P1 and P2, which belong to a 
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pentameric complex called P-stalk, and a ricin A 
chain (RTA) [36]. P-stalk is thought to be 
involved in cellular processes such as the recruit- 
ment of translational GTPases. For BLI, several 
variants of P1 and P2 including the full-length 
proteins and deletion mutants lacking the 
C-terminal domain were expressed in E. coli, 
purified, and assembled into all possible P1-P2 
heterodimers. RTA was selected as the ligand, 
which was captured on an NTA sensor through 
a hexahistidine tag on the ligand, and the P1-P2 
complexes were used as analytes. For each P1-P2 
heterodimer, binding isotherms were recorded at 
various analyte concentration, and the curves 
were globally fitted to a standard 1:1 binding 
model. Additionally, the full-length pentameric 
complex was expressed with a polycistronic 
expression system, purified, and tested in the 
same BLI experimental format for RTA binding. 
Results showed that the pentameric complexes 
had the highest affinity to RTA, and that the 
C-terminal domain of P1 (but not P2) is critical 
for the interaction [36]. 

In another application of BLI for structural 
biology, the interaction of the multidomain PfbA 
adhesin from the Gram-positive pathogen Strep- 
tococcus pneumoniae with several human 
proteins was investigated [37]. Here, the full- 
length, N-terminal, and central helix domains of 
PfbA were produced separately and used as 
ligands for BLI assays. Binding assays were 
performed for all known analytes (plasminogen, 
fibrinogen, and human serum albumin) sourced 
from commercial suppliers. Sensorgrams were 
fitted to a standard 1:1 binding model, with the 
result that the C-terminal domain of PfbA bound 
to fibrinogen, while the central region of PfbA 
bound to human serum albumin and plasmino- 
gen. These results were interpreted as evidence 
for a modular structure of PfbA, whereby the 
different domains are functionally independent, 
targeting distinct ligands [37]. 

The dip-and-read format characteristic of BLI 
has also allowed weak/slow interactions to be 
characterized. A case in point is the interaction 
between the human anaphylatoxin C5a and the 
moonlighting protein glyceraldehyde 3-phosphate 
dehydrogenase (GAPDH) from various Gram- 
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positive and Gram-negative pathogens [38- 
41]. This interaction has been detected and 
characterized by various methods, but a quantita- 
tive measurement of binding kinetics had been 
challenging owing to the weak or transient nature 
of the interaction. Recently, our group has deter- 
mined the crystal structure of GAPDH from the 
Gram-negative pathogen Leptospira interrogans 
[41], and proved that it can bind C5a using BLI and 
other complementary techniques. To obtain a 
direct measurement of the interaction binding 
parameters, we immobilized biotinylated CSa 
(sourced from Abvance Biotech) at 34 pg/mL on 
SA sensors, and dipped them on either assay buffer 
(blank) or into a concentrated GAPDH solution 
(0.23 mM, ~9 mg/mL). The binding curve with 
analyte yielded clear evidence for a slow 
(ka = 37.3M_' s~’), yet specific binding. Interest- 
ingly, the complex, once assembled, seemed rather 
strong, as we could observe little or no dissociation 
over the 300-s dissociation step. This observation 
may hold relevance for the interaction in vivo, as 
GAPDH has been identified as part of the cell wall 
and the secretome of many bacterial pathogens. 
Avidity effects on surfaces, combined with the 
slow-dissociating interaction, would result in 
effective sequestration of CSa, precluding 
macrophages and neutrophiles from being 
recruited to the site of infection. 

One of the main classes of molecular 
interactions studied by BLI is antigen-antibody 
binding. The high affinity and specificity of these 
interactions have propelled antibodies as promi- 
nent biologics in basic research and for diagnosing 
and treating many disorders. Therefore, antibody 
production and characterization are central 
endeavors for academic research as well as the 
pharmaceutical industry. BLI has proven valuable 
for applications in antibody development, includ- 
ing the screening and characterization stages. 
Some of these applications include comparing 
and classifying antibodies based on their kinetic 
and affinity properties; blocking assays that 
explore antigen-hindering interactions; antibody 
isotyping; and epitope binning by antigen binding 
competition assays [1, 31, 42-46]. 

BLI has been successfully used to develop 
antibodies derived from Ig new antigen receptors 
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(gNARs) from sharks. One exciting example 
concerns IgNARs targeting the receptor-binding 
domain (RBD) of the spike glycoprotein (S) of 
the severe acute respiratory syndrome coronavirus 
2  (SARS-CoV-2), which have potential 
applications in COVID-19 diagnostics and thera- 
peutics [47]. In this assay, two complementary 
experimental designs were used: (1) Each of four 
single domain antibodies derived from IgNARs 
(VNAR) were immobilized on Protein A 
biosensors and purified RBD was the analyte; 
and (2) biotinylated RBD was immobilized on 
SA biosensors and VNAR-Fc fusions derived 
from the four initial VNARs to RBD were used 
as analytes. Interestingly, while the VNAR 
showed affinities for RBD in the 10-°-10-* M 
range, each VNAR-Fc exhibited increased 
affinities (10°-7-10°° M) with respect to the 
corresponding VNAR. Next, to evaluate the capac- 
ity of the VNAR-Fc for blocking the RBD binding 
to the human angiotensin-converting enzyme 
2 (ACE2), a critical interaction for the infectivity 
of SARS-CoV-2, biotinylated RBDs from wild- 
type (WT), delta and omicron variants of SARS- 
CoV-2 were immobilized on SA biosensors, 
incubated with the antibodies, and then exposed 
to ACE2. Remarkably, the best blocking antibody 
was JM-2-FC, which had previously shown an 
intermediate affinity to WT RBD ao~? M); and 
while the antibody’s ability to block WT and delta 
RBD varied between 12% and 86%, it did not 
exceed 20% for omicron RBD. Next, using 
biotinylated RBD immobilized on SA biosensors 
and VNAR-Fc at saturating concentrations, 
competition assays revealed that three 
non-overlapping epitopes were recognized on 
RBD by this set of antibodies. Lastly, five 
biparatopic VNAR were produced by combining 
noncompetitive VNAR and they were 
characterized with the previous RBD-VNAR bind- 
ing and blocking assays. Results indicated that 
nearly all the biparatopic antibodies exhibited 
higher affinities to RBD in the VNAR (10% 
107° M) and VNAR-Fe (1071-107 !? M) formats 
than their corresponding monovalent forms. Fur- 
thermore, the biparatopic VNAR-Fc exhibited 
blocking capacities of the distinct RBD-ACE2 
interactions between 28% and 73% [47]. 
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BLI has also been extensively used to charac- 
terize protein-protein interactions involved in 
physiological and pathological processes. Several 
protein-protein interactions relevant in cell signal- 
ing have been studied by BLI, including the bind- 
ing of arrestin-3 to GPCR [48], the signal 
regulatory protein œ (SIRP «) recognition by 
integrin macrophage antigen 1 (Mac-1) [49], the 
caspase 9 interaction with protein phosphatase 
2A (PP2A) [50], or the fortilin binding to 
transforming growth factor beta 1 (TGF-B1) 
[51]. Similarly, the binding of the transcription 
factor GrgA to the o®° factor of the RNA poly- 
merase of Chlamydia has been addressed by BLI 
[26]. Furthermore, the high-affinity binding 
(subnanomolar) between the skeletal muscle 
myosin (SkM), with procoagulant activity, and 
the coagulation factor XI (FXI) was analyzed by 
BLI [52]. In the last years, multiple aspects of 
SARS-CoV-2 biology have been addressed using 
BLI, such as the effect of RBD mutations in the 
RBD-ACE2 interaction [53, 54], the binding of 
the S protein of SARS-CoV-2 to the C-type 
lectins DC-SIGN and L-SIGN [55], or the struc- 
tural relevance of N-glycan sites (N165 and 
N234) of the S protein for its binding to 
ACE2 [56]. 

BLI assays have also been developed for the 
study of complement system activation, a topic of 
great interest for academia and industry given the 
potent effector functions of complement. In one 
case, BLI was used to address several unresolved 
questions concerning the interaction of the Fab 
fragment of IgG/IgM by Clq of the C1 complex, 
a crucial event in the activation of the classical 
pathway of complement [57]. In a first approxi- 
mation, the researchers designed a BLI assay 
consisting of IgG immobilized on Protein L or 
SA sensors and using Clq as analyte; this assay 
was used to measure the binding kinetics and 
affinity of Clq for IgGl, IgG2, and IgG4 
antibodies. The measured affinities for the various 
IgG isotypes were in agreement with their capac- 
ity to elicit complement-dependent cytotoxicity 
(CDC). Interestingly, in absence of the specific 
antigen, IgG1 antibodies displayed similar kinet- 
ics and nanomolar affinity for Clq, although they 
produced notably different CDC effects. The 
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most striking antibody was Trastuzumab, which 
does not cause CDC. However, in presence of 
antigen the binding capacity of Trastuzumab to 
Clq was markedly hindered, emphasizing the 
importance of the antigen-antibody binding in 
the complement activation by the classical path- 
way [57]. Afterwards, a BLI experiment for the 
kinetic and affinity characterization of IgM-Clq 
interactions was optimized. Similarly to IgG1- 
Clq interactions, the obtained affinities of Clq 
for different IgM were in the nanomolar range, 
although the binding and dissociation kinetics of 
the IgM-Clq interactions were slightly slower 
[27]. More recently, a BLI approach in which 
aminopropylsilane (APS) biosensors loaded with 
polyethylene glycol (PEG) are exposed to anti- 
PEG IgM diluted in untreated serum has proved 
to be useful to evaluate complement activation 
induced by anti-PEG IgM in presence of 
antigen [58]. 

Besides enabling the characterization of 
numerous protein-protein interactions, BLI has 
also been used to screen modulators of specific 
interactions and study their regulatory function 
(28, 33, 50]. For example, a study on the interac- 
tion between the cyclin-dependent kinase 
2 (CDK2) and cyclin A (CycA) and some 
known inhibitors of CDK2 by BLI has revealed 
interesting aspects of their regulatory activities. 
The optimized BLI procedure had N-terminally 
biotinylated CDK2 immobilized on SA sensors 
and CycA as the analyte. No differences in kinet- 
ics or nanomolar affinity were detected when 
GST-CDK2 was used as ligand (immobilized 
onto anti-GST sensors) or when biotinylated- 
CDK2 was phosphorylated on its activation 
loop. Next, the modulating effects of some 
CDK2 inhibitors on CDK2-CycA binding 
properties were analyzed using three methodolog- 
ical variants: 1) CDK2 was incubated with 
an inhibitor at a fixed concentration before 
performing the binding assay at several CycA 
concentrations; 2) CDK2 was exposed to several 
inhibitor concentrations before the binding assay 
with a fixed concentration of CycA; and 3) 
the inhibitor was added after the association of 
CycA to CDK2. Interestingly, the inhibitors 
BMS265246 and dinaciclib, which target the 
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ATP binding site of CDK2, increased the affinity 
of CycA to CDK2 by decreasing the ky. In con- 
trast, the cyclin competitors AUZ 454 and MC14 
reduced the CDK2-CycA affinity by apparently 
different mechanisms: while MC14 seemed 
to hinder CycA association to CDK2, AUZ 
454 was suspected to form a ternary complex 
with CDK2-CycA thereby decreasing the affinity 
by increasing both k, and kg [35]. 

Other novel applications of BLI have been 
described in the context of protein-protein 
interactions, such as the study of enzymatic 
activities and post-translational modifications. 
For example, a BLI approach to follow the horse- 
radish peroxidase (HRP) activity has been devel- 
oped using tyramide, an HRP substrate that 
covalently binds to nearby tyrosine and trypto- 
phan residues by HRP activity in presence of 
H20». In the assay, immobilized HRP is exposed 
to a reaction mixture containing tyramide and 
the oxidase activity is followed by the BLI 
signal which is proportional to tyramide binding 
to the protein immobilized on the biosensor 
[59]. Another BLI assay has been optimized to 
study the activity of the protease of human immu- 
nodeficiency virus type 1 (HIV-1). This protease 
assay is based on monitoring the proteolysis of an 
immobilized substrate by BLI, as the BLI signal 
diminishes as a consequence of substrate frag- 
ment loss on the biosensor [22]. Finally, in a 
recent article, a BLI procedure to analyze protein 
ubiquitination has been described that monitors 
the increase in BLI signal accompanying the post- 
translational modification of an immobilized 
ligand substrate carried out by a ubiquitination 
reaction mixture [60]. 


6.4 Concluding remarks 

Biolayer Interferometry (BLI) is a robust bio- 
physical technique for the quantification of 
kinetic parameters pertinent to a variety of molec- 
ular interactions, encompassing protein-protein, 
protein-nucleic acid, and protein-small molecule 
affinities. This technique allows the detection of 
interferometry wavelength shifts induced by the 
molecular binding event at the sensor’s tip. 


6 Binding Kinetics by BLI 


Notably, BLI obviates the necessity for fluores- 
cent or radioactive labeling of the interacting 
entities. Nevertheless, immobilization of one 
interaction partner, commonly referred to as the 
ligand, on the functionalized sensor surface is 
requisite. The other partner, the analyte, is 
brought into contact with the immobilized ligand 
via sensor immersion (“dip”) into the analyte- 
containing solution. Subsequent recording 
“read”) of binding isotherms or sensorgrams at 
varying analyte concentrations enables the eluci- 
dation of key kinetic parameters, including the 
association, dissociation, and equilibrium binding 
constant (ka, kd, and Kp). The technique’s “dip- 
and-read” format offers compatibility with com- 
plex, unrefined samples. 


Acknowledgments This work was funded by the Span- 
ish Ministerio de Ciencia, Innovación y Universidades- 
FEDER grants RTI2018-102242-BI00 (MCV), the Span- 
ish Ministerio de Ciencia e Innovacién-Recovery, Trans- 
formation and Resilience Plan (PRTR) grant PDC2022- 
133713-I00 (MCV), grant S2022/BMD-7278 of the 
Regional Government of Madrid (MCV), the European 
Commission — NextGenerationEU through CSIC’s Global 
Health Platform (“PTI Salud Global”) (SGL2103020) 
(MCV), and the CSIC Special Intramural Grant 
PIE201620E064 (MCV). It was additionally supported 
by the Research Network on Complement in Health and 
Disease (RED2022-134750-T). KdIP was supported by an 
Industrial PhD grant (IND2018-010094) awarded by the 
Spanish Ministerio de Economia y Competitividad. JSL 
acknowledges the support of the PhD program in Molecu- 
lar Biosciences of the Universidad Autónoma de Madrid 
(UAM) and the Ministry of Education, Culture and Sports 
of Spain (FPU Grant 17/06090). KdIP acknowledges the 
support of the PhD program in Biochemistry, Molecular 
Biology and Biomedicine of the Universidad Complutense 
de Madrid (UCM). 


References 


1. Concepcion J, Witte K, Wartchow C, Choo S, Yao D, 
Persson H, Wei J, Li P, Heidecker B, Ma W, Varma R, 
Zhao L-S, Perillat D, Carricato G, Recknor M, Du K, 
Ho H, Ellis T, Gamez J, Howes M, Phi-Wilson J, 
Lockard S, Zuk R, Tan H (2009) Label-free detection 
of biomolecular interactions using biolayer interferom- 
etry for kinetic characterization. Comb Chem High 
Throughput Screen 12:791—800. https://doi.org/10. 
2174/138620709789 104915 

2. Kumaraswamy S, Tobias R (2015) Label-free kinetic 
analysis of an antibody—antigen interaction using 


10. 


11. 


12. 


13. 


14. 


15. 


85 


biolayer interferometry. In: Meyerkord CL, Fu H 
(eds) Protein-Protein Interactions. Springer, 
New York, pp 165-182 


. Sultana A, Lee JE (2015) Measuring protein-protein 


and protein-nucleic acid interactions by biolayer inter- 
ferometry. Curr Protoc Protein Sci 79. https://doi.org/ 
10.1002/0471140864.ps1925s79 


. Apiyo DO (2017) Biolayer interferometry (Octet) for 


label-free biomolecular interaction sensing. In: 
Schasfoort RBM (ed) Handbook of surface plasmon 
resonance, 2nd edn. The Royal Society of Chemistry, 
London, pp 356-397 


. Cleaver S, Gardner M, Barlow A, Ferrari E, Soloviev 


M (2023) Fast protocols for characterizing antibody— 
peptide binding. In: Cretich M, Gori A (eds) Peptide 
microarrays. Springer, New York, pp 83-101 


. Nirschl M, Reuter F, Vörös J (2011) Review of trans- 


ducer principles for label-free biomolecular interaction 
analysis. Biosensors 1:70-92. https://doi.org/10.3390/ 
bios 1030070 


. Abdiche Y, Malashock D, Pinkerton A, Pons J (2008) 


Determining kinetics and affinities of protein 
interactions using a parallel real-time label-free biosen- 
sor, the Octet. Anal Biochem 377:209-217. https://doi. 
org/10.1016/j.ab.2008.03.035 


. Yang D, Singh A, Wu H, Kroe-Barrett R (2016) Com- 


parison of biosensor platforms in the evaluation of high 
affinity antibody-antigen binding kinetics. Anal Biochem 
508:78-96. https://doi.org/10.1016/j.ab.2016.06.024 


. Martin SR, Ramos A, Masino L (2021) Biolayer inter- 


ferometry: protein—RNA interactions. In: Daviter T, 
Johnson CM, McLaughlin SH, Williams MA (eds) 
Protein-ligand interactions. Springer, New York, pp 
351-368 

Weeramange CJ, Fairlamb MS, Singh D, Fenton AW, 
Swint-Kruse L (2020) The strengths and limitations of 
using biolayer interferometry to monitor equilibrium 
titrations of biomolecules. Protein Sci 29:1004—1020. 
https://doi.org/10.1002/pro.3827 

Ingale J, Wyatt R (2015) Kinetic analysis of monoclo- 
nal antibody binding to HIV-1 gp120-derived 
hyperglycosylated cores. Bio-Protocol 5. https://doi. 
org/10.21769/BioProtoc.1615 

Kol S, Kallehauge TB, Adema S, Hermans P (2015) 
Development of a VHH-based erythropoietin quantifi- 
cation assay. Mol Biotechnol 57:692—700. https://doi. 
org/10.1007/s12033-015-9860-7 

Zhang H, Li W, Luo H, Xiong G, Yu Y (2017) Quan- 
titative determination of testosterone levels with 
biolayer interferometry. Chem Biol Interact 276:141— 
148. https://doi.org/10.1016/j.cbi.2017.05.013 
Carvalho SB, Moreira AS, Gomes J, Carrondo MJT, 
Thornton DJ, Alves PM, Costa J, Peixoto C (2018) A 
detection and quantification label-free tool to speed up 
downstream processing of model mucins. PLoS ONE 
13:e0190974. https://doi.org/10.1371/journal.pone. 
0190974 

Gao S, Zheng X, Wu J (2018) A biolayer 
interferometry-based enzyme-linked aptamer sorbent 


86 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


assay for real-time and highly sensitive detection of 
PDGF-BB. Biosens Bioelectron 102:57—62. https:// 
doi.org/10.1016/j.bios.2017.11.017 

Gao S, Li Q, Zhang S, Sun X, Zheng X, Qian H, Wu J 
(2022) One-step high-throughput detection of 
low-abundance biomarker BDNF using a biolayer 
interferometry-based 3D  aptasensor. Biosens 
Bioelectron 215:114566. https://doi.org/10.1016/j. 
bios.2022.114566 

Dysinger M, King LE (2012) Practical quantitative and 
kinetic applications of bio-layer interferometry for 
toxicokinetic analysis of a monoclonal antibody thera- 
peutic. J Immunol Methods 379:30—41. https://doi.org/ 
10.1016/).jim.2012.02.017 

Rao VS, Srinivas K, Sujini GN, Kumar GNS (2014) 
Protein-protein interaction detection: methods and 
analysis. Int J Proteomics 2014:1—12. https://doi.org/ 
10.1155/2014/147648 

Biswas P (2018) Modern biophysical approaches to 
study protein—ligand interactions. Biophys Rev Lett 
13:133-155. https://doi.org/10.1142/ 
$1793048018300013 

Carvalho SB, Moleirinho MG, Wheatley D, Welsh J, 
Gantier R, Alves PM, Peixoto C, Carrondo MJT 
(2017) Universal label-free in-process quantification 
of influenza virus-like particles. Biotechnol J 12: 
1700031. https://doi.org/10.1002/biot.20170003 1 
Overacker RD, Plitzko B, Loesgen S (2021) Biolayer 
interferometry provides a robust method for detecting 
DNA binding small molecules in microbial extracts. 
Anal Bioanal Chem 413:1159—1171. https://doi.org/ 
10.1007/s002 16-020-03079-5 

Miczi M, Diós A, Bozóki B, Tőzsér J, Mótyán JA 
(2021) Development of a bio-layer interferometry- 
based protease assay using HIV-1 protease as a 
model. Viruses 13:1183. https://doi.org/10.3390/ 
v13061183 

Li A, Harris RJ, Fry BG, Barnes AC (2021) A single- 
step, high throughput, and highly reproducible method 
for measuring IgM quantity and avidity directly from 
fish serum via biolayer interferometry (BLI). Fish 
Shellfish Immunol 119:231—237. https://doi.org/10. 
1016/j.fsi.2021.10.003 

Wilson JL, Scott IM, McMurry JL (2010) Optical 
biosensing: kinetics of protein A-IGG binding using 
biolayer interferometry. Biochem Mol Biol Educ 38: 
400—407. https://doi.org/10.1002/bmb.20442 

Petersen R (2017) Strategies using bio-layer interfer- 
ometry biosensor technology for vaccine research and 
development. Biosensors 7:49. https://doi.org/10. 
3390/bios7040049 

Desai M, Di R, Fan H (2019) Application of biolayer 
interferometry (BLI) for studying protein-protein 
interactions in transcription. J Vis Exp 59687. https:// 
doi.org/10.3791/59687 

Chouquet A, Pinto AJ, Hennicke J, Ling WL, Bally I, 
Schwaigerlehner L, Thielens NM, Kunert R, Reiser 
J-B (2022) Biophysical characterization of the oligo- 
meric states of recombinant immunoglobulins type-M 


28. 


29. 


30. 


31. 


32; 


33. 


34. 


35; 


36. 


37, 


38. 


J. Santos-López et al. 


and their Clq-binding kinetics by biolayer interferom- 
etry. Front Bioeng Biotechnol 10:816275. https://doi. 
org/10.3389/fbioe.2022.816275 

Shah NB, Duncan TM (2014) Bio-layer interferometry 
for measuring kinetics of protein-protein interactions 
and allosteric ligand effects. J Vis Exp 51383. https:// 
doi.org/10.3791/51383 

Ullah SF, Moreira G, Datta SPA, McLamore E, 
Vanegas D (2022) An experimental framework for 
developing point-of-need biosensors: connecting 
bio-layer interferometry and electrochemical imped- 
ance spectroscopy. Biosensors 12:938. https://doi. 
org/10.3390/bios 12110938 

Zhao H, Boyd LF, Schuck P (2017) Measuring protein 
interactions by optical biosensors. Curr Protoc Protein 
Sci 88. https://doi.org/10.1002/cpps.31 

Noy-Porat T, Alcalay R, Mechaly A, Peretz E, 
Makdasi E, Rosenfeld R, Mazor O (2021) Characteri- 
zation of antibody-antigen interactions using biolayer 
interferometry. STAR Protoc 2:100836. https://doi. 
org/10.1016/j.xpro.2021.100836 

Dubrow A, Zuniga B, Topo E, Cho J-H (2022) 
Suppressing nonspecific binding in biolayer interfer- 
ometry experiments for weak ligand—analyte 
interactions. ACS Omega 7:9206-9211. https://doi. 
org/10.1021/acsomega.1c05659 

Miiller-Esparza H, Osorio-Valeriano M, Steube N, 
Thanbichler M, Randau L (2020) Bio-layer interfer- 
ometry analysis of the target binding activity of 
CRISPR-Cas effector complexes. Front Mol Biosci 7: 
98. https://doi.org/10.3389/fmolb.2020.00098 
Wartchow CA, Podlaski F, Li S, Rowan K, Zhang X, 
Mark D, Huang K-S (2011) Biosensor-based small 
molecule fragment screening with biolayer interferom- 
etry. J Comput Aided Mol Des 25:669-676. https:// 
doi.org/10.1007/s10822-011-9439-8 

Tambo CS, Tripathi S, Perera BGK, Maly DJ, Bridges 
AJ, Kiss G, Rubin SM (2023) Biolayer interferometry 
assay for cyclin-dependent kinase-cyclin association 
reveals diverse effects of Cdk2 inhibitors on cyclin 
binding kinetics. ACS Chem Biol 18:431-440. 
https://doi.org/10.1021/acschembio.3c00015 

Grela P, Li X-P, Horbowicz P, Dźwierzyńska M, 
Tchórzewski M, Tumer NE (2017) Human ribosomal 
P1-P2 heterodimer represents an optimal docking site 
for ricin A chain with a prominent role for P1 
C-terminus. Sci Rep 7:5608. https://doi.org/10.1038/ 
s41598-017-05675-5 

Beulin DSJ, Radhakrishnan D, Suresh SC, 
Sadasivan C, Yamaguchi M, Kawabata S, Ponnuraj 
K (2017) Streptococcus pneumoniae surface protein 
PfbA is a versatile multidomain and multiligand- 
binding adhesin employing different binding 
mechanisms. FEBS J 284:3404—3421. https://doi.org/ 
10.111 1/febs.14200 

Querol-Garcia J, Fernandez FJ, Marin AV, Gómez S, 
Fulla D, Melchor-Tafur C, Franco-Hidalgo V, 
Alberti S, Juanhuix J, Rodriguez De Cérdoba S, 
Regueiro JR, Vega MC (2017) Crystal structure of 


Binding Kinetics by BLI 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


glyceraldehyde-3-phosphate dehydrogenase from the 
gram-positive bacterial pathogen A. vaginae, an 
immunoevasive factor that interacts with the human 
CSa anaphylatoxin. Front Microbiol 8:541. https://doi. 
org/10.3389/fmicb.2017.00541 

Fernandez FJ, Gomez S, Vega MC (2019) Pathogens’ 
toolbox to manipulate human complement. Semin Cell 
Dev Biol 85:98-109. https://doi.org/10.1016/j. 
semcdb.2017.12.001 

Gómez S, Querol-Garcia J, Sanchez-Barrén G, 
Subias M, Gonzalez-Alsina A, Franco-Hidalgo V, 
Alberti S, Rodriguez De Córdoba S, Fernandez FJ, 
Vega MC (2019) The antimicrobials anacardic acid 
and curcumin are not-competitive inhibitors of gram- 
positive bacterial pathogenic glyceraldehyde-3-phos- 
phate dehydrogenase by a mechanism unrelated to 
human CSa anaphylatoxin binding. Front Microbiol 
10:326. https://doi.org/10.3389/fmicb.2019.00326 
Navas-Yuste S, De La Paz K, Querol-Garcia J, 
Goémez-Quevedo S, Rodriguez De Córdoba S, 
Fernandez FJ, Vega MC (2023) The structure of 
Leptospira interrogans GAPDH sheds light into 
an immunoevasion factor that can target the 
anaphylatoxin CSa of innate immunity. Front Immunol 
14:1190943. https://doi.org/10.3389/fimmu.2023. 
1190943 

Lad L, Clancy S, Kovalenko M, Liu C, Hui T, 
Smith V, Pagratis N (2015) High-throughput kinetic 
screening of hybridomas to identify high-affinity 
antibodies using bio-layer interferometry. SLAS 
Discov 20:498—-507. https://doi.org/10.1177/ 
1087057114560123 

Kamat V, Rafique A (2017) Designing binding kinetic 
assay on the bio-layer interferometry (BLI) biosensor 
to characterize antibody-antigen interactions. Anal 
Biochem 536:16—31. https://doi.org/10.1016/j.ab. 
2017.08.002 

Choi JR, Kim MJ, Tae N, Wi TM, Kim S-H, Lee ES, 
Kim DH (2020) BLI-based functional assay in phage 
display benefits the development of a PD-L1-targeting 
therapeutic antibody. Viruses 12:684. https://doi.org/ 
10.3390/v 12060684 

Bell BN, Powell AE, Rodriguez C, Cochran JR, Kim 
PS (2021) Neutralizing antibodies targeting the SARS- 
CoV-2 receptor binding domain isolated from a naive 
human antibody library. Protein Sci 30:716—727. 
https://doi.org/10.1002/pro.4044 

Sim DS, Shukla M, Mallari CR, Fernandez JA, Xu X, 
Schneider D, Bauzon M, Hermiston TW, Mosnier LO 
(2023) Selective modulation of activated protein C 
activities by a nonactive site-targeting nanobody 
library. Blood Adv 7:3036—3048. https://doi.org/10. 
1182/bloodadvances.2022008740 

Chen Y-L, Lin J-J, Ma H, Zhong N, Xie X-X, Yang Y, 
Zheng P, Zhang L-J, Jin T, Cao M-J (2022) Screening 
and characterization of shark-derived VNARs against 
SARS-CoV-2 spike RBD protein. Int J Mol Sci 23: 
10904. https://doi.org/10.3390/ijms23 1810904 


48. 


49. 


50. 


51. 


52. 


53. 


54. 


55. 


56. 


57. 


87 


Avsar SY, Kapinos LE, Schoenenberger C-A, 
Schertler GFX, Mühle J, Meger B, Lim RYH, 
Ostermaier MK, Lesca E, Palivan CG (2020) Immobi- 
lization of arrestin-3 on different biosensor platforms 
for evaluating GPCR binding. Phys Chem Chem 
Phys 22:24086-24096. https://doi.org/10.1039/ 
DOCP01464H 

Podolnikova NP, Hlavackova M, Wu Y, Yakubenko 
VP, Faust J, Balabiyev A, Wang X, Ugarova TP 
(2019) Interaction between the integrin Mac-1 and 
signal regulatory protein a (SIRPa) mediates fusion 
in heterologous cells. J Biol Chem 294:7833—7849. 
https://doi.org/10.1074/jbc.RA118.006314 

Dorgham K, Murail S, Tuffery P, Savier E, Bravo J, 
Rebollo A (2022) Binding and kinetic analysis of 
human protein phosphatase PP2A interactions with 
caspase 9 protein and the interfering peptide C9h. 
Pharmaceutics 14:2055. https://doi.org/10.3390/ 
pharmaceutics 14102055 

Pinkaew D, Martinez-Hackert E, Jia W, King MD, 
Miao F, Enger NR, Silakit R, Ramana K, Chen S-Y, 
Fujise K (2022) Fortilin interacts with TGF-B1 and 
prevents TGF-B receptor activation. Commun Biol 5: 
157. https://doi.org/10.1038/s42003-022-03 112-6 
Morla S, Deguchi H, Zilberman-Rudenko J, Gruber A, 
McCarty OJT, Srivastava P, Gailani D, Griffin JH 
(2022) Skeletal muscle myosin promotes coagulation 
by binding factor XI via its A3 domain and enhancing 
thrombin-induced factor XI activation. J Biol Chem 
298:101567. https://doi.org/10.1016/j.jbc.2022.101567 
Gong SY, Chatterjee D, Richard J, Prévost J, 
Tauzin A, Gasser R, Bo Y, Vézina D, Goyette G, 
Gendron-Lepage G, Medjahed H, Roger M, Côté M, 
Finzi A (2021) Contribution of single mutations to 
selected SARS-CoV-2 emerging variants spike antige- 
nicity. Virology 563:134—145. https://doi.org/10. 
1016/j.virol.2021.09.001 

Vogel M, Augusto G, Chang X, Liu X, Speiser D, 
Mohsen MO, Bachmann MF (2022) Molecular defini- 
tion of severe acute respiratory syndrome coronavirus 
2 receptor-binding domain mutations: receptor affinity 
versus neutralization of receptor interaction. Allergy 
77:143-149. https://doi.org/10.1111/all.15002 
Simpson JD, Ray A, Marcon C, Dos Santos NR, 
Dorrazehi GM, Durlet K, Koehler M, Alsteens D 
(2023) Single-molecule analysis of SARS-CoV-2 bind- 
ing to C-type lectin receptors. Nano Lett 23:1496— 
1504. https://doi.org/10.1021/acs.nanolett.2c0493 1 
Casalino L, Gaieb Z, Goldsmith JA, Hjorth CK, 
Dommer AC, Harbison AM, Fogarty CA, Barros EP, 
Taylor BC, McLellan JS, Fadda E, Amaro RE (2020) 
Beyond shielding: the roles of glycans in the SARS- 
CoV-2 spike protein. ACS Cent Sci 6:1722-1734. 
https://doi.org/10.1021/acscentsci.0c01056 

Zhou W, Lin S, Chen R, Liu J, Li Y (2018) Character- 
ization of antibody-C1q interactions by Biolayer Inter- 
ferometry. Anal Biochem 549:143-148. https://doi. 
org/10.1016/j.ab.2018.03.022 


88 


58. 


59. 


Mostafa M, Elsadek NE, Emam SE, Ando H, 
Shimizu T, Abdelkader H, Ishima Y, Aly UF, Sarhan 
HA, Ishida T (2022) Using bio-layer interferometry to 
evaluate anti-PEG antibody-mediated complement 
activation. Biol Pharm Bull 45:129-135. https://doi. 
org/10.1248/bpb.b21-00772 

Kojima T, Nakane A, Zhu B, Alfi A, Nakano H (2019) 
A simple, real-time assay of horseradish peroxidase 


60. 


J. Santos-Lopez et al. 


using biolayer interferometry. Biosci Biotechnol 
Biochem 83:1822—1828. https://doi.org/10.1080/ 
09168451.2019.1621156 

De Silva ARI, Shrestha S, Page RC (2023) Real-time 
bio-layer interferometry ubiquitination assays as 
alternatives to western blotting. Anal Biochem 
679:115296. https://doi.org/10.1016/j.ab.2023.115296 


® 


Check for 
updates 


Studying Macromolecular Interactions 
of Cellular Machines by the Combined 
Use of Analytical Ultracentrifugation, 
Light Scattering, and Fluorescence 
Spectroscopy Methods 


Carlos Alfonso, Marta Sobrinos-Sanguino, 
Juan Roman Luque-Ortega, Silvia Zorrilla, Begoña Monterroso, 
Oscar M. Nuero, and German Rivas 


Abstract 


Cellular machines formed by the interaction 
and assembly of macromolecules are essential 
in many processes of the living cell. 
These assemblies involve homo- and 
hetero-associations, including protein-protein, 
protein-DNA, protein-RNA, and protein- 
polysaccharide associations, most of which 
are reversible. This chapter describes the use 
of analytical ultracentrifugation, light scatter- 
ing, and fluorescence-based methods, well- 
established biophysical techniques, to charac- 
terize interactions leading to the formation of 
macromolecular complexes and their modula- 
tion in response to specific or unspecific 
factors. We also illustrate, with several 
examples taken from studies on bacterial pro- 
cesses, the advantages of the combined use of 
subsets of these techniques as orthogonal ana- 
lytical methods to analyze protein oligomeri- 
zation and polymerization, interactions with 
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ligands, hetero-associations involving mem- 
brane proteins, and protein-nucleic acid 
complexes. 


Keywords 
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angle light scattering - Fluorescence 
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7.1 Introduction 

Most of the biochemical reactions within living 
cells, both under physiological conditions and in 
pathological processes, involve the formation of 
macromolecular complexes. Understanding the 
processes where these complexes take part as 
cellular machines requires an in-depth analysis 
of the interactions leading to their assembly, 
which can be approached by using different ana- 
lytical techniques. In this chapter we describe 
several well stablished biophysical techniques 
for the analysis of these complexes: analytical 
ultracentrifugation (AUC), size exclusion chro- 
matography coupled to multi-angle light scatter- 
ing (SEC-MALS), composition gradient multi- 
angle light scattering (CG-MALS), dynamic 
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light scattering (DLS), fluorescence anisotropy 
and fluorescence correlation and cross-correlation 
spectroscopy (FCS and FCCS), that can be indi- 
vidually used or combined orthogonally. Thus, 
the AUC technique is a reference method for the 
quantitative analysis of macromolecules and their 
interactions in solution. When applied to the 
study of homo- and hetero-associations leading 
to the formation of macromolecular complexes, 
stoichiometry, cooperativity and equilibrium 
binding affinity can be determined [1—4]. On the 
other hand, size exclusion chromatography (SEC) 
and light scattering techniques (MALS and DLS) 
are high-throughput techniques for the character- 
ization and quality control of macromolecules in 
biological and biopharmaceuticals studies. These 
techniques allow the automation of processes and 
therefore they are preferred over other methods in 
many industrial procedures [5]. The characteriza- 
tion of the mass and size of macromolecules, and 
their interaction to form complexes, has been 
extensively studied using the methods of MALS 
and DLS [6]. We will also briefly introduce here 
fluorescence spectroscopy methods, which 
endowed with an excellent sensitivity, are suitable 
for the analysis of high affinity interactions in 
solution and in complex media, including 
reconstituted systems and live cells [7]. These 
methods have frequently been employed to 
develop screening assays with biotechnological or 
medical purposes, as most fluorescence-based 
measurements can be conducted simultaneously 
on hundreds of samples of a few microliters in 
short times, by using plate readers. The comple- 
mentary use of various AUC, light scattering and 
fluorescence techniques allows obtaining a more 
robust description of the molecular mechanisms 
involved in a biological process compared 
with the application of a single method with its 
associated limitations and uncertainties (Fig. 7.1), 
as will be shown with several examples presented 
below. Furthermore, these techniques have been of 
great utility to differentiate between several possi- 
ble models of interaction in hetero-associations, 
strengthening the results obtained by structural 
techniques such as NMR or EM [8-10]. 
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7.2 Analytical Ultracentrifugation 


(AUC) 


Analytical ultracentrifugation is a powerful tech- 
nique amenable for the quantitative characterization 
of macromolecular associations in solution. AUC 
has been successfully applied to the study of 
proteins, nucleic acids, polysaccharides and 
nanoparticles, among others and is a standard 
method to determine sample purity, association 
state and molar mass of individual macromolecules. 
It is based on the application of a centrifugal force 
and the real-time monitoring of the subsequent spa- 
tial macromolecular redistribution, followed by the 
quantitative analysis of the recorded raw data [1, 4, 
11]. It is aimed for the characterization of molecular 
complexes in terms of molecular size and shape, 
stoichiometry and thermodynamic binding 
constants. Experiments are performed without the 
need for labelling or any other chemical modifica- 
tion of the sample, and there is no interaction with 
any matrix or surface. AUC can be used within a 
great range of temperatures (0—40 °C) with a great 
variety of buffers, including high concentrations and 
macromolecular crowding conditions [12]. Compre- 
hensive basic principles and common experimental 
procedures may be found in several previous 
publications [11, 13—15]. 


Instrumentation and General 
Experimental Considerations 


7.2.1 


Analytical ultracentrifuges are equipped with an 
optical system that allows the observation of the 
sample while it is being sedimented. For this pur- 
pose, the analytical rotor contains 4-8 holes where 
the sample holders (cells) are aligned. Each cell 
has a 3-12 mm pathlength epon-charcoal double- 
sector centerpiece with two chambers (80—400 uL 
sample volume), one for the sample and one for 
the reference buffer, flanked and sealed by quartz 
or sapphire windows that allow the passage of 
light across the sectors and hence the monitoring 
of the migration of macromolecules during the 
sedimentation process. There are three 
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Fig. 7.1 Illustration of the different biophysical methods described in this chapter for the characterization of molecular 
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commercially available optical detectors for ana- 
lytical ultracentrifuges: UV-VIS absorbance spec- 
trophotometer, Rayleigh interferometer (proteins, 
DNA, RNA and polysaccharides can be detected 
and quantified) and fluorescence detector (for 
labelled macromolecules). 

AUC covers a broad range of concentrations: 
from 5 pg/mL to >100 mg/mL. Sensitivity can be 
further improved to 0.1 nM when using the fluo- 
rescence detector with labelled macromolecules. 
The range of molecular weights suitable for 
AUC analysis varies from low molecular weight 
peptides (1 kDa) to large macromolecular 
structures (several hundred million Da), being 
able to distinguish between multiple coexisting 
macromolecular species and complexes. Two 
complementary AUC analytical methods are 
available: sedimentation velocity and sedimenta- 
tion equilibrium. 


7.2.2 Sedimentation Velocity (SV) 

SV is a hydrodynamic method ideal to character- 
ize biological systems, where a high centrifugal 
force is applied, and macromolecules are 
separated based on their differences in mass and 
shape. The rate of transport is measured by 
recording, at defined time intervals, absorbance, 
interference or fluorescence scans of the 
sedimenting macromolecular species present in 
the samples. The resulting sedimentation coeffi- 
cient distribution of macromolecules, (c(s); 
Fig. 7.1) provides information on the sedimenta- 
tion coefficient, concentration and, under 
favorable conditions, the molar mass of the 
sedimenting macromolecules [16]. Improvements 
in analytical methods over the last two decades 
[17, 18] have made SV one of the most powerful 
and versatile techniques for studying homo- and 
hetero-associations giving rise to the formation of 
macromolecular complexes. It provides informa- 
tion on the number of complexes formed, their 
stoichiometry and binding affinity. 

For each individual sample, recorded scans 
are globally analyzed with the c(s) method 
implemented in SEDFIT software [18] that uses 
finite element solutions of the Lamm equation 
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combined with size distribution analysis 
techniques by maximum entropy regularization 
[18, 19]. Other analytical software like UltraScan 
[20] are also being used for the analysis of AUC 
experimental data. 

The conventional c(s) distribution is confined to 
a single frictional ratio, which is typically deter- 
mined as a weight-average frictional ratio f,.,, of all 
sedimenting species, from nonlinear optimization of 
this parameter [16, 18]. The units of c(s) are such 
that integration over a peak gives the total signal of 
material sedimenting within the peak and a 
well-defined and precise weight-average s-value 
(or signal average s-value) [3]. These s-values are 
corrected to standard s29,, values (water, 20 °C, and 
infinite dilution) [21] using SEDNTERP software 
[22]. Similarly, the size distribution can be calcu- 
lated as a molar mass distribution c(M) [19] directly 
from c(s) since, for each s-value in the c(s) distribu- 
tion, a translational diffusion coefficient Ds) is 
estimated based on f,.,,. In the case of the macromo- 
lecular complexes, we may use the independently 
measured D, by dynamic light scattering and/or 
fluorescence correlation spectroscopy to calculate 
fw, afterwards fixed in the c(s) analysis, which 
in turn makes c(s) and c(M) distributions 


equivalent [23]. 
In a step beyond, taking advantage of the dif- 
ferent optical properties (ie. extinction 


coefficients) of the sedimenting species within 
the sample, SV can be carried out recording data 
at different wavelengths simultaneously in the 
same run (Multi Signal SV, MSSV). The 
resulting combination of hydrodynamic and spec- 
tral information can be globally analyzed through 
the “multi-wavelength — discrete/continuous 
distribution” model implemented in SEDPHAT 
software, to determine the spectral and 
diffusion-deconvoluted sedimentation coefficient 
distributions, c,(s). This analysis is particularly 
useful when studying interactions that result in 
multiple coexisting complexes in a mixture of 
sedimenting particles as it provides information 
about the composition of these complexes 
enabling the determination of the association 
scheme and stoichiometry [24]. 

Up to eight samples may be run at the same 
time in a single experiment for 2-7 h, depending 
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on the molar mass of the sedimenting species. 
Sample volumes loaded into analytical cells are 
400 uL for standard cells (1.2 cm pathlength) and 
100 uL for narrow cells (0.3 cm pathlength). Most 
of the buffers commonly used for in vitro analysis 
of macromolecules may be used, though high 
absorbance additives (as reducing agents) and 
viscous protectants should be maintained at 
low concentration for better results (for example, 
concentrations lower than 0.1 mM for 
2-mercaptoethanol and 5% for glycerol are pre- 
ferred). Samples should be equilibrated with their 
reference buffer (dialysis), which is especially 
important when using interference optics for 
detection. 

When using an absorbance detector, sample 
absorbance should range from 0.1 to 1.5 OD at 
the selected wavelength. 


7.2.3 Sedimentation Equilibrium (SE) 
SE methods are specially adapted for the detec- 
tion and quantitative analysis of interactions (stoi- 
chiometry, affinity, reversibility) leading to the 
formation of macromolecular complexes, includ- 
ing protein-protein, DNA-protein, and receptor- 
ligand [1, 13, 25]. The SE technique employs 
low-to-moderate speeds and longer time periods 
than SV for the runs. When equilibrium is 
reached the molecular movement in the gravita- 
tional field is balanced with diffusion, there is no 
net transport of molecules and a concentration 
gradient within the cell length is formed 
(Fig. 7.1). The analysis of this gradient provides 
information on the molecular weight (Mw) of the 
macromolecules present in each sample [4, 11], 
enabling the absolute measurement of their aver- 
age molecular weights. Furthermore, sedimenta- 
tion equilibrium studies can be used to analyze 
interacting systems. It is one of the best methods 
for the detection and characterization of revers- 
ible macromolecular interactions and the forma- 
tion of complexes, allowing the determination 
of the average molecular weight, equilibrium 
constants and stoichiometry [1]. In this technique, 
at a given rotor speed, a concentration gradient is 
formed (10 to 1000-fold) that only depends on the 
molecular weight of the species present and 
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their concentration. Experimental SE gradients 
may be analyzed with HETEROANALYSIS 
software [26]. 

The molar masses of the macromolecules and 
their complexes suitable for their characterization 
by this technique can range from less than 2 x 10° 
to more than 10° Da and, in reversible homo- and 
hetero-associations, affinities in the range of 10+ 
10° M~' can be determined. For the analysis 
of homo- and hetero-associations, data collected 
at different speeds and different loading 
concentrations are globally analyzed in terms of 
different association models using SEDPHAT 
software [27]. Protein and protein domains, as 
well as detergent-solubilized membrane proteins 
may be characterized by this method [28—30]. 

Up to 21 samples may be analyzed in a single 
SE experiment, using special six-hole analytical 
cells that allow three samples plus their references 
per cell. Two or three running speeds are selected 
according to the molecular weight of the 
macromolecules to be tested and of the 
complexes that could be originated in case we 
are studying a macromolecular interaction. After 
collecting the equilibrium scans, a final high- 
speed centrifugation run (50,000 rpm) is 
conducted to estimate the corresponding baseline 
offsets. In SE it is recommended that a previous 
gel-filtration of the macromolecules be performed 
to remove low Mw contaminants and big 
aggregates. It is also highly recommended to run 
previous SV experiment of the samples to check 
the homogeneity and number of present species, 
which will determine the probability of success of 
the SE experiment, as samples with multiple spe- 
cies are difficult to analyze. 

Buffer requirements are similar to those 
described for SV experiments. Sample volumes 
are 160-180 pL for long column experiments 
using standard cells (1.2 cm pathlength) or 
45 uL using narrow cells (0.3 cm pathlength). 
SE experiments last longer than SV ones because 
reaching the concentration gradient at equilib- 
rium, for each running speed tested, typically 
requires long periods of time, and a typical 
multi-speed experiment takes several days. 
Some proteins are not stable over these long 
periods of time and short columns (100 uL sam- 
ple), that significantly reduce the equilibration 
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time, have been used satisfactorily, but with the 
drawback of fewer data points and less accuracy. 


7.3 Light Scattering 
(LS) Techniques 
7.3.1 Dynamic Light Scattering (DLS) 


Dynamic light scattering is a non-invasive tech- 
nique for measuring the size of particles or 
molecules in solution diffusing due to Brownian 
motion. Brownian motion is the random move- 
ment of particles in suspension, and is influenced 
by particle size, sample viscosity and tempera- 
ture. A characteristic of this Brownian motion is 
that large molecules move slower than small 
ones. DLS is also known as quasi-elastic light 
scattering (QELS). In a DLS experiment, samples 
are illuminated with a monochromatic light, and 
the variation of the intensity of scattered light 
through time is registered. The timescale depen- 
dence of this variation contains information 
concerning the Brownian motion of the molecules 
in the sample and is a measure of their transla- 
tional diffusion coefficient (D,). D, is inversely 
related to the hydrodynamic size of 
macromolecules (named hydrodynamic radius, 
Ra). DLS provides information regarding the 
size and polydispersity index of molecules and 
particles in solution (Fig. 7.1). DLS is routinely 
used to determine the size of proteins, nucleic 
acids, vesicles and macromolecular complexes 
and is very well suited for the detection of small 
amounts of aggregates in a sample [31]. 

From the sedimentation coefficient obtained 
by SV and the independently measured diffusion 
coefficient by DLS or FCS the Mw of macromo- 
lecular species can be determined by using the 
Svedberg equation [32]: 


s _ Mw (1~—vpp) 
D, RT 


where v is the partial specific volume of the 
macromolecule; p is the solvent density; R is the 
gas constant, and T is the absolute temperature. 
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DLS technique allows a great variety of sam- 
ple buffers and a wide range of temperatures. The 
instrument is composed of a laser light source, a 
thermostated sample cell, a detector placed at a 
fixed or variable angle (90° is the standard but 
other angles can be used), a photomultiplier that 
amplifies the signal, and a correlator. Samples 
may be measured after fractionation by size 
exclusion chromatography (SEC-DLS) or 
non-fractionated in a cuvette (Batch-DLS). In 
SEC-DLS the D, of the different species separated 
by the column may be precisely determined (e.g., 
monomers and dimers, that otherwise are indis- 
tinguishable in terms of diffusion, can be 
characterized provided the selected column has 
the adequate segregation capacity). In the case of 
Batch-DLS a size distribution may be obtained, 
but two particles must differ at least twice in their 
R, to be differentiated (monomers, dimers, 
trimers and tetramers appears as a single peak in 
the size distribution and cannot be distinguished). 

DLS buffers must be filtered and degassed, 
and samples should be filtered or centrifuged for 
10-15 min at low speed (11,000 g), to eliminate 
bubbles and very big aggregates. In Batch-DLS, 
automated plate readers suitable for the simulta- 
neous analysis of many samples, are available. 
Only a few microliters per sample (4—15 uL) are 
needed and samples may be recovered after mea- 
surement, if necessary. The minimal concentra- 
tion of molecules required for a good signal to 
noise ratio depends on the molecular mass of the 
macromolecule (e.g., for lysozyme with a Mw of 
14,700 Da, using a cuvette with 1 cm pathlength, 
a concentration of at least 0.2 mg/mL is needed). 
Molecular lower and upper size limits are in the 
order of 0.5 and 1000 nm, respectively. 


7.3.2 Multi-angle Light Scattering 


(MALS) 


When light passes through a sample, some of the 
incident light is scattered by the particles into new 
directions. A careful analysis of the scattered light 
at different angles yields detailed information 
about the scattering particle. The intensity of 
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scattered light is directly proportional to the mass 
and concentration of the particles. Multi-angle 
light scattering (MALS) is one of the most reli- 
able methods for the determination of the mass of 
the macromolecular complexes [33-38]. It 
measures the intensity of light scattered by 
macromolecules in solution detected by different 
angles (from 2 to 18), allowing the direct deter- 
mination of their molecular weight. Two main 
analytical MALS methods are available: size 
exclusion chromatography coupled to MALS 
(SEC-MALS) and composition gradient MALS 
(CG-MALS). 
7.3.2.1 Size Exclusion Chromatography 
Coupled to Multi-angle Light 
Scattering (SEC-MALS) 
SEC-MALS allows the determination of the 
absolute mass of macromolecules fractionated 
according to their size, and, for big molecules 
with scattering angular dependence, also an esti- 
mation of the radius of gyration [36, 38]. With the 
standard configuration, a SEC column is coupled 
in-line with refractive index (RI), multi-angle 
light scattering (MALS) and UV-VIS detectors, 
though they may be also used in parallel to mini- 
mize sample dilution. The column must be 
equilibrated in the running buffer and the signal 
of the detectors must be stabilized before starting 
the experiment. Flow rate depends on the column 
size (e.g., 0.5 mL/min is used for a semi- 
preparative 10/300 mm column) and the system 
may be at room temperature, though experiments 
can be conducted at other temperatures using a 
thermostated system. A typical experiment usu- 
ally lasts from 40 to 60 min for a 10/300 column 
(24 mL bed volume). Depending on the column 
characteristics, proteins with molecular masses 
from 200 Da to 10 MDa can be analyzed. 
Running buffers must be previously degassed 
and filtered and should be compatible with the 
SEC column employed. To avoid unspecific 
interactions with the matrix of the column, some 
salt should be added to the running buffer (e.g., 
150 mM of NaCl). Samples are equilibrated in the 
running buffer and sample volume depends on the 
dimensions of the column. For 10/300 columns, 
the usual volumes are 20-200 uL (50-500 pg of 
protein). 
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SEC-MALS has been successfully used for the 

characterization, among others, of membrane 
proteins solubilized with detergents [39], adeno- 
associated virus [40], and polysaccharide—protein 
complexes [41]. SEC-MALS is used for protein 
and polymer characterization in industry [5] for 
its high reproducibility, short running time and 
the possibility to automate the whole procedure. 
7.3.2.2 Composition Gradient 
Multi-angle Light Scattering 
(CG-MALS) 
Composition-gradient MALS (CG-MALS) is a 
technique for quantifying the affinity and stoichi- 
ometry of macromolecular interactions. In 
CG-MALS simultaneous measurements of the 
intensity of scattered light and of the species 
concentration are collected (Fig. 7.1), and we 
can obtain the weight-averaged molar mass of 
the species in equilibrium, and so the stoichiome- 
try and affinity of the interaction. With that pur- 
pose, light scattering and concentration data from 
the RI detector are continuously acquired from a 
solution whose composition is being varied with 
time in a controlled and known fashion. The 
resulting time-dependent scattering and composi- 
tion profiles are globally modeled in the context 
of molecular models for these composition- 
dependent scattering data [42]. 

A pump system with three syringes injects, in 
a sequence and at time intervals required for the 
programmed concentration gradient, three differ- 
ent components (interacting species A and B, and 
buffer) into the multi-angle light scattering and 
concentration detectors, that could be an UV-VIS 
detector or a RI detector. 

CG-MALS set-up is especially useful for the 
analysis of complex stoichiometries and when 
the interacting macromolecules exhibit self- 
association in addition to hetero-association 
[43]. Ka values from 0.1 uM up to mM may be 
determined by this technique. 


7.4 Fluorescence Spectroscopy 


Approaches 


Fluorescence spectroscopy methods have been 
widely used to interrogate biomolecular interactions 
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involving proteins, nucleic acids, lipid membranes 
and small ligands [7, 44, 45]. These methods are 
generally characterized by their outstanding sensi- 
tivity, reaching the detection of single molecules. 
Because of this notable sensitivity, fluorescence 
techniques allow assessing the energetic parameters 
of high affinity interactions under true equilibrium 
conditions (i.e., at concentrations of the target fluo- 
rescent species below the dissociation constant). 
Complexes requiring high concentrations of their 
integrating elements to be assembled can also be 
studied by fluorescence, though. For this purpose, 
the addition of a small fraction of one of the 
reactants labeled with a fluorophore, which will 
act as a tracer representing the whole amount of 
this species (that is, including the unlabeled frac- 
tion) is often a useful strategy (see, for example, 
[46]). This allows overcoming limitations imposed 
by the upper detection limit and by the lack of 
linearity between the fluorescent signal and the 
fluorophore concentration above a certain concen- 
tration threshold. 

Fluorescence methods have been applied to 
study protein dynamics and interactions making 
efficient use of the intrinsic fluorescence 
displayed by many proteins harboring tryptophan 
residues [47]. In other instances, labeling of the 
protein, nucleic acid, ligand or lipid with an 
extrinsic dye is either necessary or more conve- 
nient, depending on the kind of study intended 
[45, 48]. The possibility of including labeled spe- 
cies is particularly attractive for measurements in 
complex media and, in fact, it is one of the 
reasons of the success of fluorescence approaches 
for quantitative determinations in crowding 
solutions, reconstituted cell-like systems, fixed 
or live cells and even in tissues and multicellular 
organisms [49-52]. Among the extrinsic dyes, 
many different small organic molecules are com- 
mercially available, modified with suitable reac- 
tive groups to be conjugated with proteins, 
usually at amino groups (N-terminus or lysine 
residues) or at cysteine residues. Additionally, 
fluorescently labeled oligonucleotides, lipids and 
certain small ligands can be purchased from vari- 
ous companies. Proteins labeled by fusion with 
a fluorescent protein are routinely used for 
measurements in cells although, occasionally, 
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this type of labeling has also been used for 
in vitro measurements in solution and in 
reconstituted cell-like systems. Selection of the 
fluorophore or fluorophores is a key step in the 
design of a fluorescence-based assay and it should 
consider the specific requirements of the fluores- 
cence method (number of fluorophores, excitation 
and emission wavelengths of each fluorophore, 
level of photostability, quantum yield, etc.) but 
also the type of interaction under study and the 
complexity of the system [48, 53]. This is impor- 
tant to minimize the contribution of background 
species to the fluorescence signal, and to avoid 
alterations in the target interaction introduced by 
the dye. 

There are many different fluorescence 
methods that can be applied individually or, 
more efficiently, combined with each other or 
with additional biophysical or structural methods 
to characterize biomolecular complexes. Changes 
in the fluorescence emission subsequent to the 
interaction of a molecule bearing a fluorophore 
with a partner can be of use to monitor this 
interaction employing a regular spectrofluorome- 
ter [45]. When complexes larger than the free 
species are formed, methods such as fluorescence 
anisotropy or fluorescence correlation spectros- 
copy (FCS) can be applied to determine the 
affinity and the possible cooperativity of the 
interaction. 

Fluorescence anisotropy [54] requires an 
instrument equipped with polarizers and it is 
very sensitive to size changes as it depends on 
the rotational diffusion of the species. Aside from 
the global tumbling, local motions of the dye 
would also contribute to the measured steady- 
state anisotropy and, by performing time-resolved 
measurements, these kinds of motions can be 
identified and quantified. Anisotropy methods 
have been profusely utilized to unravel the molec- 
ular mechanisms of processes in which protein- 
protein, protein-nucleic acid or protein-ligand 
complexes intervene. A powerful approach 
to this end entails the parallel application of ana- 
lytical ultracentrifugation or light scattering 
methods for the independent determination of 
stoichiometries [55, 56]. Suitable models to ana- 
lyze the anisotropy isotherms (Fig. 7.1) based on 
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these stoichiometries could be subsequently built 
by the use of software such as BIOEQS [57-59]. 

FCS measurements, on the other hand, rely on 
the analysis of the fluctuations in the intensity 
arising from the fluorescent species within a 
small open volume created by a laser beam 
focused into the sample through the objective of 
a microscope [60—62]. From these fluctuations, 
autocorrelation curves are obtained (Fig. 7.1) 
that contain information about the concentration 
and translational diffusion of the fluorescent spe- 
cies under study. FCS is very useful for the anal- 
ysis of protein aggregation and polymerization 
events, for which anisotropy is more limited. In 
contrast, because it depends on the translational 
diffusion, FCS sensitivity to the formation of 
complexes moderately larger than the free species 
is, as with DLS, lower than in the case of anisot- 
ropy. Interactions can be also detected and 
quantified by using the two-color version of 
FCS, fluorescence cross-correlation spectroscopy 
(FCCS [63, 64]) with the only requirement that 
the interacting species, labeled with spectrally 
different dyes, move together upon complexation 
(i.e., there is no minimum size change needed for 
the complexes to be detected). In addition to the 
autocorrelation curves corresponding to the 
fluorescence fluctuations in each channel, a 
cross-correlation curve is obtained in FCCS 
upon formation of complexes bearing the two 
fluorophores (Fig. 7.1). In the absence of artifacts, 
the appearance of this cross-correlation curve 
proves interaction, and its amplitude, relative to 
that of the autocorrelation curves, allows affinity 
evaluation as well as assessment of the stoichiom- 
etry of the complexes formed [65—67]. 

For additional details on the above mentioned 
fluorescence methodologies, their experimental 
requirements and their application to study 
interactions, the readers are referred to a compre- 
hensive and didactic review [7], in which another 
popular fluorescence method, Förster Resonance 
Energy Transfer (FRET) is also discussed. 

Aside from the usefulness of fluorescence 
methods to unravel mechanisms of interaction in 
the context of fundamental research schemes, they 
also find many applications in biotechnology and 
biomedicine. Thus, fluorescence approaches are 
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among those of choice for the development of 
screening assays, due to the low amount of sample 
required and the rapidity of the measurements, 
which can often be performed using plate readers 
[45]. Many examples of the use of anisotropy and 
FCS to develop systematic screening assays in vitro 
and in vivo can be found in the literature [68-71]. 


7.5 Global Application of these 
Techniques to the Study 
of Complex Formation 
7.5.1 Quantitative Analysis 


of the Self-Association 
and Activation of the Bacterial 
Chaperone ClpB 


In the following example, modulation of the 
functional oligomerization of the protein ClpB 
was addressed by using a combination of 
analytical ultracentrifugation and light scattering 
techniques. ClpB is a molecular chaperone that 
belongs to the Hsp100 family of ring-forming 
heat-shock proteins involved in the protein 
quality control system in bacteria. This protein, 
in combination with the DnaK system, 
disaggregates unfolded and aggregated proteins 
reactivating protein aggregates. ClpB was previ- 
ously described to self-associate from monomers 
to form hexamers (the functional unit) but it was 
not known how factors such as protein and 
salt concentration and natural ligand (ADP and 
ATP) binding modulate this equilibrium. The 
techniques SE, SV and CG-MALS were applied 
to get some insight into these aspects [72]. 

SV was first applied to characterize samples of 
ClpB at two different salt conditions and to eval- 
uate their polydispersity. Profiles showed that at 
low ionic strength (50 mM KCl) 10 uM ClpB 
sedimented as a single peak compatible with the 
protein hexamer, and at high ionic strength 
(500 mM KCl) as a single peak compatible with 
protein monomer. The presence of nucleotide 
(ADP or ATP) induced a slight decrease of the 
sedimentation coefficient but had no significant 
effect on the association state that at low ionic 
strength remained mainly hexameric. To further 
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analyze the association scheme of the protein and 
its modulation by ligands, two techniques were 
employed: SE and CG-MALS. These techniques 
allowed direct determination of the masses, 
overcoming the limitations of SV for this pur- 
pose, especially in case several species are pres- 
ent, and determination of the molecular masses 
is not possible. Application of these methods 
benefited from previous analysis by SV, 
according to which different species populated 
the solutions under each condition tested, mean- 
ing that the average masses retrieved could be 
assigned to a defined species or a mixture of 
them. The dependence of the oligomerization 
state of ClpB on protein and salt concentration 
(50-500 mM KCl) as well as on the presence of 
the natural ligands were subsequently determined 
by SE and CG-MALS. Decrease of ionic strength 
induced displacement of the equilibrium 
isotherms toward the formation of hexamers. 

The standard Gibbs free energy change calcu- 
lated from the ClpB hexamerization constant 
showed that ATP-bound ClpB was 10 kJ/mol 
more stable than the ADP state. Nucleotide 
exchange might promote the conformational change 
of the protein that drives its functional cycle. 


7.5.2 Untangling the Central 
Bacterial Division Protein FtsZ 


Polymers 


The characterization of the self-association of 
FtsZ [23], the major component of the bacterial 
division ring whose reversible GTP-induced 
polymerization plays a key role in cytokinesis 
[73, 74] represents a good example of how the 
application of multi-parametric approaches com- 
bining several orthogonal methodologies can aid 
in the challenging investigation of fibrils of 
medium-large size. The difficulty in the analysis 
of FtsZ assemblies is further enhanced by the 
variability in the degree of polymerization and 
the final arrangement of these dynamic polymers, 
very sensitive to the specific conditions under 
which they are formed. 

Initial studies on FtsZ polymers triggered by 
GTP or GMPCPP (a slowly hydrolysable 
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analogue of GTP), conducted at 500 mM KCl 
and 5 mM MgCl,, showed single peaks in SV 
profiles, with faster sedimentation for the 
GMPCPP polymers (Fig. 7.2a). This was compat- 
ible with either a larger mass, or an equal mass but 
a more compact shape or a more flexible structure 
for the polymers induced by the analog. The 
ambiguity was solved through complementary 
FCS and DLS measurements (Fig. 7.2c), render- 
ing D,-values that together with the s-values were 
used to calculate, via Svedberg equation (see 
above), the molecular mass of the polymers. 
This estimation indicated that the difference 
observed in the sedimentation of the polymers 
was related to their different mass. The joint 
application of FCS and DLS allowed bypassing 
the uncertainties associated with each method in 
the non-straightforward analysis of solutions 
containing polymers. Thus, while FCS measured 
the polymer diffusion, the profiles were also sen- 
sitive to the presence of unassembled protein, 
which besides providing additional characteriza- 
tion of the system, increases the number of fitting 
parameters and hence the uncertainty of the anal- 
ysis. DLS autocorrelation functions, on the other 
hand, were mostly contributed by the bigger spe- 
cies (given the substantially smaller size of unas- 
sembled species), hence providing accurate 
measurements of the diffusion of the polymer. 
However, DLS measurements are highly sensi- 
tive to dust and any kind of unwanted aggregates 
that might be present in the solution. The good 
agreement between DLS and FCS data allowed 
validating the models used for analysis in each 
case and discarding an influence of the dye used 
to label FtsZ on the FCS determinations. 
Multi-angle light scattering measurements, 
in the concentration-gradient setup, were 
undertaken to obtain an independent and direct 
determination of the molar mass of the polymers. 
In this case, experimental data analysis involved 
applying a model that considers the dependence 
of the intensity of the scattered light with the 
protein concentration and detection angle 
(Fig. 7.2b). The obtained masses were similar to 
those from the combination of s- and D,-values, 
confirming the higher mass for the GMPCPP 
polymers. This dependence analysis also allowed 
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Fig. 7.2 Characterization of FtsZ polymers in the pres- 
ence of GTP and GMPCPP using several biophysical 
techniques. Illustration of (a) FtsZ sedimentation velocity 
profiles in the presence of either GTP or GMPCPP, 


getting an estimate of the dimensions of FtsZ 
polymers by the calculation of the radius of 
gyration, somewhat larger for GMPCPP-FtsZ 
polymers. The molecular weight distributions cal- 
culated from c(s) distributions and externally 
determined frictional ratios via Svedberg [18] 
were in good agreement with the results derived 
from the other techniques employed. 

The described orthogonal approach was then 
used to determine the impact of protein, salt and 
magnesium concentrations on the FtsZ polymers. 
Within the 0.4-1.5 g/L protein concentration 
interval, at 5 mM MgCl, and 500 mM KCI, the 
polymers triggered by GTP and GMPCPP 
displayed constant s- and D,-values and hence 
constant mass, suggesting a concerted formation 
of preferred FtsZ fibrils and highlighting the qual- 
itatively similar self-association schemes for both 
kinds of polymers despite of their different size. 
At lower KCl concentrations the GTP polymers 
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(b) angular and concentration dependence of scattering 
of FtsZ polymers and (c) dependence with FtsZ concen- 
tration of translational diffusion properties of FtsZ 
polymers formed by GTP or GMPCPP 


were smaller but still narrowly distributed in size, 
in excellent agreement with reported electron 
microscopy (EM) data, while the GMPCPP 
polymers increased in size and heterogeneity 
[75]. The GTP polymers exhibited their charac- 
teristic narrow size distribution in the 5—0.25 mM 
MgCl, interval. However, at lower MgCl, 
concentrations, SV showed two main species 
with s-values smaller than those of the polymer 
and a significant increase in D, was found by DLS 
and FCS, indicating a smaller size. The GMPCPP 
polymers were less sensitive to the MgCl, con- 
centration, and no polymers were detected in 
solutions lacking Mg** regardless which nucleo- 
tide was present. 

An increase in the amount of unassembled 
FtsZ was observed by SV and FCS upon lowering 
Mg” concentration when polymerization was 
elicited by GTP. This suggested an influence of 
the cation on the critical concentration of 
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assembly (C,.), which is a threshold concentration 
above which the protein assembles into polymers. 
90° scattering and fluorescence anisotropy were 
independently applied to verify this hypothesis 
(Fig. 7.3). The 90° scattering data showed a con- 
certed transition between a low-slope region at 
low FtsZ concentration and a second region at 
higher FtsZ concentration with a substantially 
greater slope. Analysis of these data in terms of 
a solubility model considering two main scatterer 
species (a narrow distribution of low molecular 
weight scatterers at concentrations below the 
solubility value, and a high molecular weight 
scatterer of concentration-independent mean size 
at concentrations above it) determined a drastic 
drop in the C, values for GTP-FtsZ upon increas- 
ing MgCl, concentration (Fig. 7.3a). The impact 
of Mg”* on the C, was further explored by a 
fluorescence anisotropy assay in which fluores- 
cently labeled FtsZ was titrated with increasing 
concentration of the unlabeled protein [46]. The 
formation of polymers resulted in an anisotropy 
increase due, among other factors, to their larger 
size compared with that of the unassembled pro- 
tein. Analysis of the anisotropy dependence with 
protein concentration allowed determination of 
the C.e, indicating a substantial decrease with 
higher magnesium concentration (Fig. 7.3b), in 
good agreement with 90° scattering analysis. 
This example shows how the use of hydrody- 
namic and thermodynamic methods in a comple- 
mentary fashion allows overcoming the 
uncertainties associated to each method when indi- 
vidually applied to the system, thus providing very 
reliable quantitative information on FtsZ polymers. 
These detailed analyses were crucial for the 
subsequent characterization of the mechanisms 
used by different modulators of FtsZ assembly in 
the context of division, for example, antagonists 
involved in the positioning of the FtsZ ring at 
mid-cell [76], like MinC and SlmA [56, 77]. They 
also represented a starting point to analyze the influ- 
ence of factors such as the crowded and heteroge- 
neous nature of the bacterial cytoplasm on FtsZ 
polymerization. Along this line, a study based on 
fluorescence anisotropy and light scattering 
evidenced an enhancement of the FtsZ tendency to 
polymerize in the presence of single crowders or 
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mixtures mimicking the heterogeneity of the intra- 
cellular environment. The effect was more pro- 
nounced with negatively charged proteins or DNA 
than with neutral polymers or positively charged 
proteins, and mostly nonadditive for mixed 
crowding agents [78]. 


7.5.3 Unravelling the Interactions 
of the Bacterial Division Protein 
FtsZ with the Membrane 
Anchor ZipA Solubilized 
in Nanodiscs 

Biophysical techniques such as the ones 


described here can be applied to study the 
interactions of membrane proteins by using phos- 
pholipid bilayer nanodiscs to keep them in solu- 
tion. Nanodiscs are highly soluble and stable 
particles formed by a phospholipids bilayer 
surrounding a target membrane protein, the 
whole structure being stabilized by a membrane 
scaffold protein [79] providing a native-like 
membrane environment. 

In this example the membrane protein ZipA, a 
component of the division machinery that 
provides membrane tethering to the central cell 
division FtsZ protein [76], was included in 
nanodiscs (Nd-ZipA) where the phospholipid 
bilayer consisted in a mixture matching the lipid 
composition of the E. coli inner membrane. The 
interactions of Nd-ZipA with FtsZ in the absence 
of GTP, under conditions in which the protein is 
forming oligomers (GDP-FtsZ) and with the FtsZ 
polymers elicited by GTP were analyzed by SV, 
SE and FCS techniques [80]. 

Nd-ZipA were purified by SEC and the 
incorporation of ZipA to nanodiscs was initially 
verified by electrophoretic analysis. To ascertain 
the incorporation of a single copy of ZipA into 
the biomimetic membrane, empty nanodiscs 
(Nd) and Nd-ZipA were analyzed by analytical 
ultracentrifugation. SV showed that both 
sedimented as single species with different 
sedimentation coefficients (Fig. 7.4a). SE 
experiments were performed in parallel to deter- 
mine the stoichiometry of the Nd-ZipA 
complexes. The comparative analysis of empty 
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Fig. 7.3 Determination of the critical concentration (C,) 
of polymerization of FtsZ. Illustration of (a) concentration 
dependence of GTP-FtsZ static light scattering at 90° and 


and ZipA nanodiscs demonstrated that the 
Nd-ZipA complex contained one molecule of 
ZipA incorporated in each nanodisc. The empty 
nanodiscs and Nd-ZipA were further 
characterized by FCS and DLS, which showed a 
slight decrease in the translational diffusion coef- 
ficient upon incorporation of ZipA (Fig. 7.4b). In 
both cases, the profiles obtained were compatible 
with single homogeneous species. For FCS 
measurements, a small fraction of lipids labeled 
with a fluorophore was incorporated in the lipid 
mixture used to generate the nanodiscs. 

A previous study of the hetero-association of a 
soluble mutant of ZipA, lacking the transmem- 
brane region (sZipA) to GDP-FtsZ [43], indicated 
that independently of the oligomer size the bind- 
ing affinity was moderate, and only one sZipA 
was bound to any of the different FtsZ species 
present in the solution (monomer to hexamer). 
The analysis by SE showed that the addition of 
increasing concentrations of GDP-FtsZ oligomers 
to fluorescently labeled Nd-ZipA resulted in 
complexes of higher molar mass with a similar 
association model as the one describing the bind- 
ing of GDP-FtsZ to sZipA, a single Nd-ZipA 
molecule binding to one FtsZ oligomer indepen- 
dently of the oligomer size. 

The interaction of Nd-ZipA with FtsZ 
polymers was assessed by FCS, using nanodiscs 
labeled in the lipids and unlabeled FtsZ polymers. 
Formation of complexes was evidenced by the 
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(b) steady-state fluorescence anisotropy measurements of 
FtsZ-GTP as a function of protein concentration 


large decrease in the speed of diffusion of 
Nd-ZipA, subsequent to the interaction with the 
FtsZ polymers triggered by GTP (Fig. 7.4b). 
Titration of FtsZ polymers into solutions 
containing Nd-ZipA allowed quantification of 
the apparent affinity of binding, which turned 
out to be similar to that for GDP-FtsZ oligomers. 
Physical interaction between ZipA and FtsZ in 
the complexes was proved by FCCS, using 
Nd-ZipA and FtsZ labeled with spectrally differ- 
ent fluorophores, namely lissamine rhodamine B 
and Alexa Fluor 488. FCCS experiments showed 
significant cross-correlation for Nd-ZipA but not 
for the empty nanodiscs (Fig. 7.4c), confirming 
an interaction involving FtsZ/ZipA contacts with- 
out interference from the lipids and/or the scaf- 
fold protein in the nanodiscs. Sedimentation 
velocity experiments using the labeled Nd-ZipA 
also evidenced an interaction with FtsZ polymers, 
resulting in a notable increase in the sedimenta- 
tion coefficient of the reconstituted membrane 
protein (Fig. 7.4a). The usefulness of the assays 
developed to study the ZipA/FtsZ interactions for 
the identification of molecules interfering with 
them was showed by including in the reaction 
mixture a peptide known to disrupt the complexes 
through competition with FtsZ for the interaction 
with ZipA. 

This work evidenced the little effect, if any, 
that the transmembrane region of ZipA has on the 
formation of the complex with FtsZ. Likewise, it 
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Fig. 7.4 Characterization of the interaction between ZipA 
embedded in nanodiscs and FtsZ polymers. Illustration of 
(a) sedimentation velocity analysis of empty Nd and of 
Nd-ZipA, in the absence and presence of GTP-induced 
FtsZ polymers, (b) normalized FCS autocorrelation 
profiles for Nd and for Nd-ZipA in the absence and 


shows that FtsZ interaction with ZipA is the same 
disregarding the nucleotide present, despite the 
different tendency of GDP-FtsZ and GTP-FtsZ 
to self-assemble. 


7.5.4 Protein-DNA Complexes: The 
Interaction Between 
the Repressor Protein Reg576 


and Its DNA Operator 


An explanatory example of the power of AUC to 
characterize protein-DNA hetero-associations in 
solution is provided by the study of the oligomeri- 
zation state of the Bacillus pumilus repressor protein 
Regs76 alone and once bound to its DNA operator 
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presence of GTP-FtsZ polymers and (c) FCCS analysis 
of the interaction of Nd-ZipA with FtsZ polymers labeled 
with spectrally different dyes. The flat curve indicates the 
absence of cross-correlation from samples containing 
empty nanodiscs and FtsZ polymers. Stars indicate 
fluorophores 


sequence [81]. The study was approached combin- 
ing MSSV, SE and DLS techniques. In a first step 
SV and DLS were used in parallel to ascertain the 
oligomerization state of Regs76 alone. At different 
concentrations Regs76 behaved in SV assays as a 
single species with an experimental sedimentation 
coefficient (s) compatible with the theoretical mass 
of the nearly globular Regs7¢ dimer. DLS analysis 
of Regs76 yielded a translational diffusion coeffi- 
cient, which once introduced with the obtained 
s coefficient into the Svedberg equation, resulted 
in an apparent molar mass very close to the molecu- 
lar mass of the Regs76 dimer. This mass was con- 
firmed by SE assays with different concentrations of 
Regs76 that showed a mass matching the molecular 
weight of the protein dimer. 
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Once established the oligomerization state of 
Regs76 in solution, the next step was to ascertain 
the way Regs76 dimers bind to their DNA operator. 
With this aim and taking advantage of the different 
extinction coefficients of Regs7, and the DNA, 
MSSV experiments using samples containing only 
the DNA fragment, Regs75 alone and the DNA 
fragment with an excess of Regs75 were carried 
out. Three types of DNA fragments were used: the 
promoter with its two intact flanking Regs76 
operators, or derivatives in which only one or both 
flanking operators were mutated. 

As expected, the different DNA fragments in 
the absence of Regs76 showed the same s-value, 
because they had nearly the same molecular mass. 
The presence of Regs7¢ did not modify the s- 
value of the DNA fragment when both operators 
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Fig. 7.5 Combination of SV, SE and MSSV to ascertain 
oligomerization state of the DNA-binding protein Reg576. 
(a) Sedimentation coefficient distributions, c(s), obtained 
from SV assays at 260 nm for the DNA fragment with 
intact operators (green traces; upper plot), the DNA with 
one mutated operator (blue traces; middle plot) and the 
DNA fragment with both operators mutated (brown traces; 
lower plot) showing the shift in the s-value of Regs76- 
DNA complexes (solid trace) relative to the corresponding 
DNA alone (dashed trace). (b) Global multi-wavelength 
analysis of the complexes resulting from the interaction of 
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were mutated, indicating the absence of interac- 
tion. On the contrary, in the presence of Regs76, a 
major s-value increment occurred for the DNA 
fragment having both operators intact, and a mod- 
erate increment for the fragment with a single 
operator mutated (Fig. 7.5a). The differential 
increase in the s-value suggested that the frag- 
ment with intact operators bound more Regs76 
dimers than the fragment with one mutated oper- 
ator. To address this issue the simultaneous absor- 
bance data acquisition at 230 and 260 nm was 
globally analyzed through SEDPHAT to get the 
diffusion-deconvoluted sedimentation coefficient 
distributions with spectral deconvolution of the 
absorbance signals, c,(s). The MSSV analysis of 
Regs7¢-Intact DNA complex indicated that the 
areas under the peaks corresponded to a 
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Regs76 with the intact DNA fragment (green traces; upper 
plot) and Regs76 with the DNA fragment with one mutated 
operator (blue traces; lower plot) and decomposition into 
component sedimentation coefficient distributions, c,(s), 
for Regs7¢ (empty peaks) and the different DNA fragments 
(solid peaks). (c) Binding isotherms for the interaction of 
Regs76 with the DNA fragment with intact operators 
(green circles) and the DNA fragment with one mutated 
operator (blue triangles). The solid curves represent the 
best fit of the three-parameters Hill equation to the SE 
experimental data 
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stoichiometry of 7.9 moles of Regs7¢ bound per 
mol of DNA. The ratios of Regs75 moles with 
respect to the single mutated DNA fragment was 
4.3 (Fig. 7.5b). 

To corroborate the calculated stoichiometries 
determined by MSSV, sedimentation equilibrium 
experiments were carried out with a fixed DNA 
concentration titrated with increasing Regs76 
concentrations. This technique allows determin- 
ing the exact molecular weight of the complexes 
formed and, since the molecular weight of the 
DNA and protein components are known, the 
obtained masses can be used directly to calculate 
the number of Regs76 molecules bound to a 
DNA fragment. Figure 7.5c shows the binding 
isotherms built from the experimental molar 
mass increments obtained by SE at low speed 
and 260 nm, through an empirical three 
parameters Hill plot. These results, together with 
those obtained by MSSV, clearly confirm that 
two Regs76 dimers bind to one functional opera- 
tor. In addition, these results confirm that muta- 
tion of one operator abolishes the binding of both 
Regs7¢ dimers to this operator. 
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Abstract 


Nuclear magnetic resonance (NMR) and 
native mass spectrometry (MS) are mature 
physicochemical techniques with long 
histories and important applications. NMR 
spectroscopy provides detailed information 
about the structure, dynamics, interactions, 
and chemical environment of biomolecules. 
MS is an effective approach for determining 
the mass of biomolecules with high accuracy, 
sensitivity, and speed. The two techniques 
offer unique advantages and provide solid 
tools for structural biology. In the present 
review, we discuss their individual merits in 
the context of their applications to structural 
studies in biology with specific focus on pro- 
tein interactions and evaluate their limitations. 
We provide specific examples in which these 
techniques can complement each other, 
providing new information on the same scien- 
tific case. We discuss how the field may 
develop and what challenges are expected in 
the future. Overall, the combination of NMR 
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and MS plays an increasingly important role 
in integrative structural biology, assisting 
scientists in deciphering the three-dimensional 
structure of composite macromolecular 
assemblies. 


Keywords 


Biophysics - Nuclear magnetic resonance 
(NMR) - Native mass spectrometry (MS) - 
Protein interactions - Structural biology 


8.1 Introduction 

The three-dimensional shape of a protein is 
described by its tertiary structure. In some cases, 
this structure is well defined and rigid, as in 
globular proteins. In others, it is flexible and 
covers several quite different conformations, 
as in intrinsically disordered proteins. Other 
proteins, probably the majority, include a mixture 
of the two situations. In all cases, structure 
determines protein function, including enzymatic, 
structural, transport, and regulatory functions 
[1]. However, proteins do not usually act alone 
but form large macromolecular complexes that 
coordinate and perform diverse molecular 
functions within the cell [2, 3]. Understanding 
the composition, stoichiometry, affinity, and 
structure of such complexes is a primary goal 
of structural biology. Two techniques have 
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increasingly emerged to assist structural 
biologists in this endeavor: Mass Spectrometry 
(MS) and Nuclear Magnetic Resonance (NMR) 
spectroscopy. These techniques have a 
long history. Development of MS goes all the 
way back to the end of the nineteenth century 
and was utilized to ionize samples in a magnetic 
field and separate the resulting ions according 
to their mass-to-charge ratio (m/z) [4]. The first 
NMR spectrum was collected in 1949 by 
physicists in search of a proof of the 
nuclear spin [5]. Both techniques are now 
widely exploited by chemists, biologists, 
biotechnologists, and medical doctors with 
applications in chemistry, analytics, biotechnol- 
ogy, materials, diagnostics, and biology. 

The two techniques provide different informa- 
tion at different levels of resolution depending on 
the specific application, but they both have 
become essential tools of structural biology that 
can go hand-in-hand to understand the assembly 
of even large protein complexes. In this review, 
we will focus on the use of these techniques in 
view of assessing molecular interactions. Since 
both fields are huge, we will restrict our analysis 
to liquid phase NMR applied to the detection of 
molecular interactions and native MS to limit the 
scope of our review. It is worth mentioning that, 
although the widely spread use of native MS is 
relatively recent [6, 7], the technique was 
conceived and used already in the 1990s to 
study protein folding [8]. We will discuss the 
advantages and limitations of each of the two 
techniques in the detection of molecular 
interactions, stress their complementarity and 
provide examples in which they have been used 
together to provide a deeper understanding of a 
specific biological problem. 


8.2 Native MS Applications 


in Structural Biology 


In the last 30 years, native MS has emerged as an 
important tool for the investigation of molecular 
complexes because it preserves non-covalent 
interactions between biomolecules [9-13]. Using 
native MS, we assess the mass and stoichiometry 
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of intact non-covalent complexes, identify direct 
interactions between their components, character- 
ize stable subcomplexes and assign the relative 
position (core vs. periphery) of subunits 
(Fig. 8.1). We determine the hierarchy of an 
assembly pathway by mixing subunits in a step- 
wise manner [14] or by using different pH values 
to induce a change in oligomeric state 
[15, 16]. Native MS has been combined with 
ion mobility (IM) to investigate the shape of 
macromolecular assemblies [17—23]. Native MS 
coupled with bioinformatics has also been very 
powerful for studying the evolutionary history of 
protein complexes [24—27]. 

During native MS investigations, macromo- 
lecular complexes are subjected to a gentle ioni- 
zation that preserves non-covalent interactions. 
Electrospray ionization (ESI) is the main soft 
ionization method for native MS experiments. 
ESI-MS experiments may be influenced by the 
nature of the intermolecular interactions, by the 
composition, ionic strength, and pH of the sample 
buffer, and by the voltages and pressures within 
the mass spectrometer. Consequently, it is impor- 
tant to take these parameters into account when 
the native MS data are acquired and analyzed. 
Unlike other types of ESI-MS analysis, neither 
acidic conditions nor organic solvents are used. 
Instead, native MS experiments are usually 
performed using volatile buffers such as ammo- 
nium acetate [28, 29], ethylenediammonium 
diacetate [30] or alkylammonium acetate 
[31]. Typically, the buffer is exchanged immedi- 
ately prior to native MS analysis [32]. 

Native MS provides important structural infor- 
mation when high-resolution data are not available 
[33]. For example, a bacterial complex involved 
in chromosome organization and segregation, 
called MukBEF, was investigated by native MS, 
size exclusion chromatography, multi-angle light 
scattering (SEC-MALS), isothermal titration cal- 
orimetry (ITC), epifluorescence microscopy and 
in vivo functional studies [34]. The MukBEF 
complex contains MukB (an ATPase), MukE, 
and MukF. Native MS showed that three major 
complexes were detected [a 2-mer (MukB2), 
6-mer (MukE4:MukF2) and 8-mer (MukB2: 
MukE4:MukF2)] in the presence of ADP. An 
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Fig. 8.1 The different levels of information provided by 
native MS about non-covalent assemblies. When a macro- 
molecular complex is investigated, the various 
components (the whole complex, stripped complexes, 
individual components, and subcomplexes) are released 
and analyzed (white boxes). MS can provide information 
related to binding (yellow boxes) such as stoichiometry, 
interface stability, allostery, and equilibria. It can also 


additional 10-mer (MukB4:MukE4:MukF2) was 
observed in the presence of a non-hydrolysable 
ATP analogue. Combined with biophysical infor- 
mation, these data offered a first glimpse of the 
MukBFF organization and its changes after ATP/ 
ADP binding [34]. 

Native MS provides important information on 
unexpected complex components. For instance, 
in the case of the investigation of membrane 
protein complexes, native MS allows the identifi- 
cation of unknown lipids bound to an assembly. 
This is particularly important because lipids are 
normally characterized with great difficulty with 
other techniques, such as high-resolution cryo- 
Electron Microscopy (EM), because of their 
small size [35]. 

Native MS allows the study of the dynamics 
of macromolecular assemblies. For instance, an 
amyloidogenic protein called transthyretin (TTR) 


l) 


Assembly 
pathways 


Dissociation 
in solution 


provide information related to organization (light blue 
boxes). MS measurements provide insight into dynamics, 
binding interactions, structural organization, and dynamics 
of a macromolecular complex (green boxes). Arrows show 
the flow of information. MS? and MS° stand for 
experiments that generate first and second generation 
product ions 


has been extensively investigated by native MS 
[36—40]. Exchange of unlabeled and labeled TTR 
subunits was monitored over time by native MS 
[36, 38—40]. These experiments allowed the 
assessment of the effect of point mutations on 
subunit exchange. MS-based information com- 
bined with neutron crystallography data and 
modelling studies led to propose a novel mecha- 
nism of TTR fibrillation [39]. 


8.3 Advantages and Limitations 


of Native MS 


Native MS presents several advantages compared 
to other structural approaches. First, native MS 
does not require samples to be crosslinked or 
labeled because the experimental conditions pre- 
serve non-covalent interactions. Second, it allows 
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the study of a wide variety of biological samples 
that differ in mass, symmetry, dynamics, flexibil- 
ity, and heterogeneity [9]. For instance, different 
oligomeric states can be characterized simulta- 
neously. Since data are not averaged over differ- 
ent species, specific information is obtained for 
each individual species. Thus, the dynamics of 
quaternary structure is studied in real time 
[10, 11]. Finally, native MS is a highly sensitive 
technique: only a few microliters of sample at 
relatively low (uM) concentration are typically 
required. 

Regarding the limitations of native MS, it 
should be taken into account that the macromo- 
lecular complexes are detected in the gas phase. 
Since MS analyses are performed under vacuum 
conditions [12, 13], the relative abundances of 
detected assemblies may differ from those in 
solution. Distinct complexes may present 
distinct ionization, transmission, and detection 
probabilities [14]. For instance, electrostatic 
interactions are stronger in the gas phase than in 
solution. Conversely, hydrophobic interactions 
become weaker. On the other hand, the literature 
shows that the transition from solution to the gas 
phase does not drastically alter biomolecules 
[15, 16]: low-energy electron holography con- 
firmed the stability of folded biomolecules in 
ultra-high vacuum [17]. It was also proven that 
the behavior of biomolecules in the gas phase 
resembles the one in solution: when the solution 
conditions (e.g., pH and concentrations) were 
modified, the gas-phase spectra changed corre- 
spondingly [14]. As an example, homo- 
complexes were engineered to be stable above 
pH 6.5 [18]. When the pH was lowered, buried 
histidine residues became protonated and these 
assemblies underwent cooperative, large-scale 
conformational changes detected by 
exclusion chromatography and native MS [18]. 

Another limitation of native MS could be that 
the experiments are performed using volatile 
buffers (e.g., ammonium acetate). Therefore, the 
buffer is exchanged prior to native MS analysis 
[19]. In some cases, the biochemical steps 
involved in sample preparation require optimiza- 
tion to ensure that the native state of an assembly 
is preserved during buffer exchange. 


size- 
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Overall, native MS is extremely beneficial to 
the investigation of macromolecular assemblies 
thanks to its selectivity, accuracy, sensitivity, 
and analysis speed. Nonetheless, these inherent 
limitations discussed above should be born in 
mind when data are evaluated. 


8.4 NMR Applications 
in Determining Molecular 


Interactions 


Three main lines of information may be obtained 
by biological NMR: the structure of a protein, 
the dynamics and mapping of the interactions 
between molecules (small ligands or 
macromolecules) (Fig. 8.2). All the first structures 
solved by liquid-phase NMR date in the early 
1980s and were based on the collection of dis- 
tance restraints that were then used to reconstruct 
the overall three-dimensional structure [20— 
24]. Since then, the field has matured so that 
solving the structure of proteins has become rou- 
tine provided that the protein size is within NMR 
reach. This limit has been constantly pushed for- 
ward and it is now possible to afford also molec- 
ular weights around 40-50 kDa [25]. Dynamics 
information may be obtained from relaxation 
measurements, which may provide details on 
flexible, yet rather rigid regions and residues in 
conformational or chemical exchange. 

Where, however, NMR remains of particular 
importance is the determination of interactions. In 
complexes of sufficient stability (nanomolar), it is 
possible to obtain the structure of the complex by 
collecting intermolecular distances as obtained 
experimentally or predicted [26, 27]. However, 
other much simpler and more general approaches 
may be used to study protein interactions, which 
yet provide information at atomic resolution. 
Here, we will focus on these, given that they 
might be more directly complementary to native 
MS. 

Detection of interactions by NMR relies on 
changes in the local electron density that are 
induced by the spatial proximity of another 
molecule which will influence the most easily 
observable NMR parameter, the chemical shift 
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Fig. 8.2 Summary of the different layers of information 
obtainable by NMR on proteins. All starts with the assign- 
ment of the NMR spectrum (yellow box). This step is not 
indispensable but necessary when residue-specific infor- 
mation is wanted. There are then three main resulting 


[28]. Spatial proximity of groups with magnetic 
susceptibility anisotropies like aromatic rings can 
also induce large changes in chemical shifts lead- 
ing to what is called ring current shift effect. 
Detection of the variations of chemical shift by 
an interaction is called chemical shift perturbation 
(CSP) [29]. Typically, CSP is measured by 
recording a series of NMR spectra, usually 
'SN-Heteronuclear Single Quantum Correlation 
(HSQC) experiments of the 'SN labeled protein 
in the absence and in the presence of varying 
amounts of the binding partner. It is of course 
essential that both components are dissolved in 
exactly the same buffer and measured under the 
same conditions, since chemical shifts, especially 
those of amide protons, are very sensitive to 
differences in pH, temperature, and buffer com- 
position. If there is an interaction, the chemical 
shifts of the residues involved in the complex are 
displaced from their original position. There are, 
however, two limiting cases (Fig. 8.3). In the fast 
exchange regime, the two signals collapse into 
one, whose chemical shift represents the popula- 
tion averaged value of the free and complex 
saturated protein. This means that, depending on 
the relative amount of protein, interactor, and the 
value of the dissociation constant (Ky), the 
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routes (green boxes): determination of the protein struc- 
ture, dynamics, or interactions. The scheme is meant to 
provide an overall picture but should by no means be 
considered exhaustive 


resulting signal will have a chemical shift in 
between those of the free and bound states, fol- 
lowing a linear trajectory between the chemical 
shifts of the two states. In the slow exchange 
regime, we observe instead both the signals of 
the bound and free states, and the peak volumes 
represent their relative amounts. Departures of the 
HN chemical shifts in the N HSQC from a 
linear trajectory may be seen when the interaction 
involves multiple binding sites with different 
affinities. In this case, linearity will be seen only 
until the primary stronger binding site is 
saturated. After this point the chemical shift vari- 
ation might change direction to fill the second 
weaker binding site. A classic example of this 
behavior is the interaction between the TAZ2 
domain of a transcriptional coactivator titrated 
with the tumor suppressor p53 domain AD1 
domain in which the resonances of the "N 
labeled TAZ2 domain completely change 
directionality [30]. 

A quantitative estimate of the Kg value can be 
obtained from plotting the chemical shift 
variations in the titration. This very simple 
method is particularly powerful when, for 
instance, the interaction involves two proteins 
one of which is above the NMR size limits. In 
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Fig. 8.3 The different 
exchange regimes in NMR 
detection. Left panel: 
During the titration the 
resonances of the two 
species (free and bound) 
can be in slow exchange 
(top), fast exchange 
(bottom), or various 
degrees of intermediate 
exchange. Right panel: 
Slow and fast regimes as 
observed in a 
two-dimensional HSQC 
spectrum. In the slow 
exchange regime, the free 
and bound peaks have 
distinct chemical shifts 
(top). In fast exchange, the 
peak moves gradually from 
the chemical shift position 
of the free to the bound 
form (middle). Usually, the 
trajectory is linear. 
However, in some cases, 
the trajectory is non-linear 
(bottom). In these cases, 
there are more than one 
binding site: linearity is 
seen until the primary 
stronger binding site is 
saturated. After this point 
the chemical shift variation 
might change direction to 
fill the second weaker 
binding site 


this case and provided that the interaction is not 
so tight to provoke the immediate disappearance 
of the spectrum, it is possible to map the surface 
of interaction on the small component [31]. 

An alternative to CSP is saturation-transfer 
difference NMR (STD-NMR) [32, 33]. This 
method, which relies on the Nuclear Overhauser 
Effect (NOE), is carried out by irradiating a target 
protein by a radiofrequency in a region solely 
populated by it. A second spectrum in which 
saturation takes place off-resonance (Jo) in an 
empty region of the spectrum, is then acquired 
and the difference spectrum (Isro=lo — Isat) iS 
plotted that will contain only signals from the 
saturated ligand that interact with the protein. 
The method typically works for interactions in 
the fast exchange regime, with Kg values in the 
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range 10 *10°* mol L~! [34]. When probing 
the interaction between two proteins, the experi- 
ment typically requires the uniformly isotopic 
labeling of one of the components with 7H and 
'SN, leaving the other component unlabeled, and 
using 10/90% H,O/D.2O mixtures. Under these 
conditions the labeled protein has low proton 
density thus suppressing most of the spin 
diffusion [35]. The complex is irradiated at a 
frequency that affects only the unlabeled 
protein, for instance, choosing aliphatic proton 
resonances. In this way, spin diffusion from the 
high proton density unlabeled protein leads to 
cross-saturation transfer to the interacting partner. 
When one of the partners is RNA, irradiation may 
take place in the region of the spectrum where the 
exchangeable protons of the nucleic acids are 
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(10-14 ppm). In this region of the spectrum, only 
the resonances of RNA are typically present [36]. 
Other equally important approaches to deter- 
mine interactions by NMR, which will not be 
discussed here, are paramagnetic resonance 
enhancement, chemical exchange saturation 
transfer (CEST), transferred NOE (for small 
ligands), and NOE editing/filtering experiments 
(26, 27]. The information obtained in all these 
experiments allows to obtain distance restraints 
which can then be translated into the structure of 
the complex by classical _ restraint-driven 
simulated annealing or docking approaches. 


8.5 Advantages and Limitations 
of NMR in Determining 


Interactions 


One of the main advantages of liquid-phase 
NMR, as it may easily be deduced from Session 
8.4, is that it may provide information on 
interactions at the single amino acid level whereas 
many other techniques work averaging the signal 
over the whole protein or reporting only on spe- 
cific groups, like with fluorescence. It is also 
important to understand that, depending on the 
affinity of the interaction and the exchange 
regime, NMR can provide distance restraints, 
which may be used either much in the same way 
used to derive the three-dimensional structure of 
an individual protein or to guide an experimen- 
tally driven docking procedure. It is however 
important to remember that shifting of a particular 
signal in CSP experiments does not always indi- 
cate that the corresponding residue is in the inter- 
face between the two molecules: shifting can be 
caused indirectly by conformational changes 
induced by binding that propagate throughout 
the protein. However, since this limitation does 
not hold for STD-NMR because it works only for 
residues directly involved in the interaction, it is 
possible to perform both experiments and extract 
more information from their complementarity. 

A second important advantage of NMR is that 
this technique works within a large range of 
affinities, which can go up to values in the milli- 
molar range. This means that NMR is a technique 
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able to report on a quite large range of affinities, 
even though it might not be the optimal method to 
follow the full titration for very tight complexes. 
On the other end of the spectrum, NMR allows to 
observe also transient interactions. This feature 
places NMR in a unique place, since many 
techniques, such as isothermal calorimetry or 
fluorescence spectroscopy, may only be suitable 
in very specific ranges of affinities. 

The third, but not less important, advantage of 
NMR lies in the possibility of obtaining informa- 
tion on flexible or unstructured proteins (the 
so-called intrinsically disordered proteins or 
IDPs [37, 38], which are beyond the reach of 
other high-resolution structural techniques, such 
as X-ray crystallography and cryo-EM. 

At the same time, while offering unique 
advantages, NMR has also undeniable limitations. 

First of all, NMR provides little information 
about stoichiometry. It is certainly possible to 
obtain estimates on the molecular weight of a 
complex from the correlation time [39] or simply 
from T, relaxation values [40] and obtain from 
these observables the stoichiometry. However, 
the method provides only a rough estimate, 
which, for small proteins, is affected by an error 
too large to be reliable. The method is also more 
reliable for globular proteins. 

A second area that might be problematic 
concerns limitations in obtaining information on 
systems outside the NMR size range. This num- 
ber is not easily definable because the limits are 
constantly pushed forward and they strongly 
depend on the type of expected information, 
complex affinities, exchange regime, and labeling 
scheme or, ultimately, on the operator’s 
creativity. We have, for instance, used a hybrid 
methodology based on a mixture of NMR, small- 
angle X-ray scattering (SAXS), site-directed 
mutagenesis, and molecular docking to determine 
the structure of a weakly interacting 110 kDa 
complex in which one component was a small 
protein, while the other component was over 
90 kDa [31, 41]. In these studies, we applied the 
technique to gain insights into the structure of 
complexes formed among proteins involved in 
the molecular machine that produces the essential 
iron-sulfur cluster prosthetic groups. 
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8.6 The Successful Combination 


of the Two Techniques 


NMR and native MS have been successfully com- 
bined to thoroughly investigate non-covalent 
interactions, characterizing binding sites, and 
monitoring dynamic events [42—44]. The two 
methods have for instance been employed to 
study the specificity of the interaction between 
a bacterial ABC transporter (McjD) and a 
ribosomally synthesized and post-translationally 
modified peptide (RiPP), called microcin J25 
(MccJ25) and related sequences [45]. Organisms 
synthetize RiPPs to damage closely related spe- 
cies. The RiPP peptides are also harmful to the 
producing organisms, which have specific ABC 
transporters to prevent self-toxicity. A primary 
site of interaction between MccJ25 and McjD 
was identified by CEST NMR experiments 
[45]. This is a sensitive approach based on the 
equilibrium between low molecular weight 
species (observable with solution NMR) and tran- 
siently bound states with slow-tumbling high- 
molecular-weight complexes, which cannot be 
observed directly in solution NMR as a result of 
line broadening. Based on the information 
obtained by CEST, a MccJ25 deletion variant 
was generated lacking F10 and V11 (called 
MccJ25-AFV). Using native MS and a ligand- 
induced ATPase assay, the interaction between 
MccJ25-AFV and McjD was investigated. The 
data indicated that MccJ25-AFV has lower affin- 
ity for the transporter McjD than wild-type 
(WT) MccJ25, revealing the mechanism of the 
recognition between membrane transporters and 
RiPPs. This study represents a stepping-stone 
towards the employment of bacterial cell factories 
to produce novel bioactive compounds. 

Another interesting example of the successful 
combination of NMR and native MS was the 
study of two peptides of 37-amino acids which 
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form amyloid fibrils in pancreatic islets of type 
2 diabetes patients [46]. In this study, the authors 
analyzed the human islet amyloid polypeptide 
(hIAPP) and a more amyloidogenic natural form 
of hIAPP carrying a point mutation on serine 
20 (S20G). Native MS was then applied to screen 
1500 molecules from a Protein-Protein Interac- 
tion (PPI) library. This allowed the identification 
of compounds binding hIAPPs. Among them, 
two small molecules were identified for their con- 
siderable effects on hIAPP aggregation. Specifi- 
cally, one of the compounds delayed the 
aggregation of WT hIAPP, but not of S20G 
variant. The other molecule enhanced the rate of 
aggregation of both peptides. 2D SOFAST- 
HMQC NMR allowed Xu et al. (2022) to charac- 
terize the residues involved in the binding of 
small molecules to '°N-labeled monomeric 
TAPP [46]. 

Combination of native MS and NMR is partic- 
ularly helpful for the characterization of macro- 
molecular complexes that are small or flexible, 
“invisible” to EM, and generate no crystals. 
Recently, NMR spectroscopy, native and 
crosslinking MS combined with isothermal 
titration calorimetry, small-angle X-ray scattering 
and other methods enabled scientists to 
characterize the interaction of an HIV-1 protein 
(Rev) with a human nuclear transport factor 
(Importin-B, Impp) [47] (Fig. 8.4). Rev is crucial 
for viral replication because it mediates the 
nuclear export of intron-containing viral RNA 
transcripts. Rev contains many intrinsically disor- 
dered residues. It is known that Rev. is imported 
into the nucleus by Impp. However, it is poorly 
understood how Rev. and Impf are associated. 
The structural investigation of Impf is particu- 
larly challenging. Indeed, this nuclear transport 
factor is conformationally highly flexible, an 
important feature that permits the efficient recog- 
nition of different cargo molecules. Using 


Fig. 8.4 (continued) magenta circles correspond to 
unbound Imp . Peaks labelled by blue circles with a single 
dot or by green circles with two dots correspond to Impf/ 
Rev. complexes with 1:1 or 1:2 stoichiometry, respec- 
tively. (B) Spectrum of Imp in the absence of Rev. 
(c, d) Spectra of Imp incubated with a two- or fivefold 
molar equivalents of Rev°” respectively. (e) Spectra of 
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Imp in the presence of the truncated construct Reve?” 
(residues 4-69) lacking the C-terminal domain. (f, g) Spec- 
tra of Impf incubated with (f) a twofold or (g) a fivefold 
molar equivalent of wild-type Rev (Rev™'). (Adapted 
from [47]) 


impf (unbound) © 1:1 complex 


Fig. 8.4 Binding stoichiometryStoichiometries of the 
HIV-1 Rev/ImpB complex and its interactions. (a) NMR 
analysis of Rev binding by Impf. (a, left panel) 
IH, N-HSQC spectrum of free Rev°? (600 MHz, 
283 K). Rev? is a mutated form of Rev (V16D and 


I55N), which is oligomerization-deficient and mono- 
disperse. (a, right panel) 'H,'°N-HSQC spectrum 
Rev°P (blue) bound to Imp and superimposed on that 
of unbound Rev? (green). (b-g) Native MSNative 
Mass Spectrometry (Native MS) spectra. Peaks labelled by 
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integrative structural biology, this work provided 
novel insights into the molecular recognition of 
Rev. by Imp and an atypical binding behavior of 
Rev. 

The last example that we would like to discuss 
allowed monitoring of the dynamics of a large 
macromolecular complex. Through the combined 
use of native MS, NMR, and EM it was possible 
to investigate the assembly pathway of tetrahedral 
aminopeptidase 2 (TET2), a 468 kDa homo- 
complex [42]. During complex assembly, the 
subunits undergo appropriate conformational 
rearrangements to form a supramolecular struc- 
ture. Tracking such rearrangements represents a 
challenge because of the low-abundance of inter- 
mediate states that change over time. 
Although the self-assembly is a time-dependent 
process occurring at the molecular level, under- 
standing how it occurs often originates from static 
structures, low resolution techniques, and 
modelling. While NMR is in principle able to 
monitor structural changes at the atomic level in 
real time, size and time resolution constraints 
represent a challenge. In the specific example of 
TET2, methyl specific labelling of deuterated 
TET2 combined with relaxation optimized, fast 
acquisition real-time NMR was used to overcome 
both size and time scale limits. Negative staining 
EM allowed characterization of the shape of the 
large oligomeric intermediates, which 
appeared along the self-assembly pathway. 
Aliquots of the sample were taken every 
30 seconds and negatively stained with ammo- 
nium molybdate. Thus, EM snapshots were cap- 
tured in a time-resolved manner. It was observed 
that the initial intermediate of the TET2 self- 
assembly was of small size, undetectable by 
EM. “Isotopic hybridization” and native MS 
were then used to better monitor the initial events 
of self-assembly. The mass and stoichiometry of 
the hybrid complex was assessed by native MS 
thereby allowing the identification of the interme- 
diate at the very early stages of the self-assembly 
cascade, which consists in the monomer. Thus, 
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native MS complemented well the NMR and EM 
data used to probe the structural changes of the 
subunits formed during the assembly of large 
macromolecular complex. 


8.7 Concluding Remarks 


and Outlook 


We have underlined here the individual advantages 
and limitations of NMR and native MS and 
discussed applications in which the two techniques 
were used in a complementary way. We strongly 
believe that the future of structural biology is in the 
integration of different techniques, including in 
silico predictions, that can complement and inte- 
grate each other [48]. This perspective in turn 
implies that structural biologists will need to be 
aware of the complementarities and of the possibil- 
ity of designing new ad hoc modalities to solve 
specific scientific problems. 

Another frontier of structural biology is work- 
ing in environments as close as possible to the 
biological ones. In this field, native MS has 
steadily gained momentum thanks to its wide 
applicability, speed of analysis, sensitivity, and 
selectivity [49-51]. This latter feature allows the 
simultaneous analysis and separation of mixtures 
of several species with quite different masses. 
This property was, for instance, exploited to 
image individual proteins and a macromolecular 
complex by low-energy electron holography, 
demonstrating that MS can be a preparative 
approach to purify heterogeneous assemblies for 
structural studies [17]. 

While NMR cannot easily discriminate 
non-labelled mixtures, it can, in specific cases, 
explore proteins directly in cell, both of prokary- 
otic and eukaryotic organisms [52], permitting to 
understand how the cellular milieu affects 
features such as the folding/unfolding equilibrium 
of proteins or interactions between proteins of 
interest and molecules constituting the intracellu- 
lar matrix [53]. 
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When working directly with crude cell lysates 
and culture media (i.e., without the need of puri- 
fication), the combined use of native MS and 
NMR can inform on the identity, solubility, olig- 
omeric state, and stability of overexpressed 
biomolecules [54—56]. This allows the investiga- 
tion of proteins and macromolecular complexes 
only minimally purified [57]. Both techniques 
have been used for the investigation of endoge- 
nous macromolecular assemblies in prokaryotic 
and in eukaryotic hosts. Investigation of endoge- 
nous complexes is in fact helping to decipher 
the function and regulation of post-translational 
modifications, cofactors and transient complexes 
assembled in a cellular milieu [57—60]. The added 
value of the sensitivity of native MS is that it 
permits the study without the need of recombinant 
protein over-expression [61, 62]. Once identified 
the components, it is then possible to study the 
various components individually and solve the 
structure by X-ray crystallography, NMR, or 
cryo-EM. In this endeavor, NMR provides an 
important contribution especially for IDPs or dis- 
ordered domains of longer proteins [63]. 

Finally, native MS and NMR have the poten- 
tial to become crucial tools for the characteriza- 
tion of biomaterials. A current trend is the 
investigation of biogenic polymers such as those 
needed by the industry to replace petroleum- 
based polymers with materials of biogenic origin 
to better develop the so-called “circular 
bioeconomy” [64]. Recent literature has for 
instance reported the investigation of squid pens 
(e.g., from Doryteuthis pealeii and Loligo 
vulgaris) to better exploit their features such as 
transparency and flexibility [65, 66]. In a different 
set of recent studies, it has been possible to char- 
acterize and determine the structure of mussel 
proteins [67, 68]. Some marine organisms are 
able to resist aqueous tidal environments and 
adhere tightly on wet surfaces thanks to adher- 
ence properties. This behavior has raised increas- 
ing attention for potential applications in 
medicine, biomaterials, and tissue engineering. 
Among the marine organisms with adhesion 
properties are mussels that strongly adhere to the 
rock through the secretion of protein-based 
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stringy structure called byssus. These filaments 
consist of bundles of fibers made of proteins that 
are synthesized in the mussel foot, a large organ 
that in freshwater allows the mussel to pull 
through the substrate and move. The combination 
of native MS and NMR will be ideal to determine 
the structures of the individual proteins and to 
study the mechanism of their assembly. 

While these are exciting new avenues that can 
be explored in the future, there is still the need to 
reflect on what are the technical advancements 
needed in these areas. For native MS, there is 
the need of further development of novel mass 
spectrometers, providing even better sensitivity, 
resolution, and ionization efficiency. Orbitrap 
instruments already show improvements in 
sensitivity and resolution as compared to Q-TOF 
mass spectrometers [69-71]. Novel detectors 
advance the applicability of MS to study 
biological nanoparticles. For instance, 
nanoelectromechanical systems (NEMS) detect 
large masses with unprecedented sensitivity, 
requiring only a few hundred single-molecule 
adsorption events to detect megadalton molecules 
[72]. Another example of valuable approaches is 
the charge detection mass spectrometry (CDMS), 
whereby the m/z and z are simultaneously 
measured for each ion [73]. CDMS combined 
with Orbitrap technology [74] was recently 
utilized to characterize adeno-associated viruses 
(AAVs), important vectors for gene therapy. The 
method allows users to quantify different AAV 
bioprocessing products and to rapidly assess the 
integrity and amount of genome packed in AAV 
particles [75]. 

On the NMR side, it cannot be denied that 
modern NMR spectroscopy in liquids has reached 
an unprecedented level of sophistication which 
covers both the determination of biomolecular 
structures, the dynamics at atomic resolution, 
and an incredible range of determination of mac- 
romolecular interactions between molecules of 
very different sizes from proteins/nucleic acids 
to small molecules (e.g., drug screening) 
[76]. However, even having gone from the milli- 
molar to the micromolar range of affinities, the 
sensitivity and specificity of NMR remains low as 
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compared to other techniques, especially 
MS. This means that some important biological 
problems remain out of the NMR reach and can 
beneficially be complemented by other 
techniques. In the future, higher field magnets 
and more efficient pulse schemes may help to 
widen these possibilities. 

In conclusion, we hope to have provided an 
overview, albeit by necessity incomplete, of the 
complementarities between two techniques of pri- 
mary importance in structural biology. 
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Abstract 


structural biology stage with cryo-electron 
microscopy and other innovative structure 


X-ray crystallography has for most of the 
last century been the standard technique to 
determine the high-resolution structure of 
biological macromolecules, including multi- 
subunit protein-protein and protein-nucleic 
acids as large as the ribosome and viruses. As 
such, the successful application of X-ray crys- 
tallography to many biological problems 
revolutionized biology and biomedicine by 
solving the structures of small molecules and 
vitamins, peptides and proteins, DNA and 
RNA molecules, and many complexes— 
affording a detailed knowledge of the 
structures that clarified biological and chemi- 
cal mechanisms, conformational changes, 
interactions, catalysis and the biological pro- 
cesses underlying DNA replication, transla- 
tion, and protein synthesis. Now reaching 
well into the first quarter of the twenty-first 
century, X-ray crystallography shares the 


determination methods, as relevant and central 
to our understanding of biological function 
and structure as ever. In this chapter, we 
provide an overview of modern X-ray crystal- 
lography and how it interfaces with other 
mainstream structural biology techniques, 
with an emphasis on macromolecular 
complexes. 
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For nearly 60 years, macromolecular X-ray crys- 
tallography (MX) has provided a steady stream 
of high-quality, information-rich structures of 
biomolecules and of their complexes, to an extent 
that it is no exaggeration to state that the structural 
information generated by MX has revolutionized 
modern biology, biochemistry, and biomedicine. 
Until quite recently, the only high-resolution 
structures of macromolecular complexes have 
come from MX. There is virtually no aspect of 
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modern molecular and cellular biology that has 
not been transformed at a fundamental level by 
the possibility to decipher the intricate molecular 
architecture of macromolecular complexes that 
participate in those processes. Nowadays, MX is 
a mature discipline that has spurred the develop- 
ment of methods for data processing, data treat- 
ment, structure determination, electron density 
map interpretation, macromolecular model build- 
ing and refinement, all of which play crucial roles 
in the advancement of macromolecular modeling 
and participate to varying degrees in the recent 
breakthrough in cryo-electron microscopy. In 
those various capacities, MX continues to be an 
active, important, and relevant discipline [1]. 
Many excellent reviews have been written 
on the use and practice of MX for the determina- 
tion of crystal structures of macromolecular 
complexes [2, 3]. Our goal is to highlight new 
aspects of MX that facilitate work with macromo- 
lecular complexes. From new techniques to 
produce homogeneous and stable samples of 
multisubunit complexes to approaches to solve 
complex, multicomponent structures, this review 
attempts to offer practical guidance and summa- 
rize the main advances made on MX with regards 
to multisubunit complexes. When introducing a 
new section, we will make use of particularly 
interesting examples to make our arguments 
more memorable while showing real-life 
applications of the concepts discussed. 


9.2 Producing Multisubunit 


Complexes Discovery 


A major hurdle to any structural technique is 
obtaining sufficient amount of high-quality pro- 
tein complex samples. This requirement is espe- 
cially acute for MX since relatively large amounts 
of the protein sample are necessary to complete 
the sparse-matrix crystallization screens. This 
limitation has been alleviated by the arrival of 
nanocrystallization techniques, which allowed 
to perform between 3—10 times more crystalliza- 
tion experiments than conventional, manual 
techniques. By a “sufficient amount” we typically 
mean several mg of a protein complex at a 
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concentration between 5—15 mg/mL (200-60 pL 
per mg). By “sufficient quality” we refer to 
samples that are stable over the course of the 
crystallization experiment (one day to several 
weeks to months), ideally lack macro- and 
microheterogeneity (i.e., proteolytic degradation, 
chemical modifications), and are devoid of 
extreme conformational flexibility (which would 
preclude crystallization). 

To reach the quality goal expression systems 
capable of generating the required amounts of 
protein complexes are a must. This is no trivial 
task since multisubunit complexes often have 
strong requirements with respect to the properties 
of the expression system. For example, some 
protein complexes cannot be assembled properly 
unless certain chaperones are present. Other pro- 
tein complexes are assembled from individual 
components which cannot stand or produce dif- 
fraction crystals on their own (e.g., because of 
lacking a defined structure outside their complex) 
(Fig. 9.1). In other cases, the net expression rates 
of all individual subunits must be matched 
to avoid substoichiometric complexes or the 
coexistence of complexes with varying stoichi- 
ometry. Finally, certain subunits have specific 
requirements that can only be matched by one or 
a few expression systems (e.g., native system, 
post-translational modifications by eukaryotic 
chaperones). 

Since the 1990s, the development of ever more 
powerful expression systems has been tied to the 
success of structural biology as a discipline. In 
parallel to the development of other key enabling 
technologies (e.g., synchrotrons, X-ray detectors), 
high-throughput expression screening and minia- 
turization of expression tests has played a transfor- 
mative effect on how modern structural biology is 
carried out. In particular, four expression systems 
stand out as having played important roles in 
advancing structural biology: (1) Escherichia 
coli; (2) Baculovirus-infected insect cell culture 
(BVS); (3) Mammalian cell culture; and (4) Cell- 
free systems (CFS) [5, 6]. In the context of this 
review, it should suffice to state that animal cell 
culture has made possible the production of previ- 
ously inaccessible protein complexes, including 
transcription factor complexes, cellular receptors, 
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Fig. 9.1 Production and crystallization of the iC3b- 
CR3 al heterodimeric complex. (a) Heterocomplex 
assembled in vitro from stoichiometric amounts of native 
source iC3b and recombinant CR3 al. (b) Size-exclusion 


membrane proteins, and many others, and could 
therefore be considered as the most disruptive 
expression technology of the turn of the twenty- 
first century. In contrast, the E. coli expression 
system has kept on marshaling the production of 
uncountable proteins and protein complexes for 
structural biology at a rate and at a cost-efficiency 
unmatched by any other system. The continuous 
optimization and fine-tuning of the E. coli system 
has ensured its stability and relevance throughout a 
period marked by the rise of animal cell culture. 
Finally, despite the fact that CFS has attracted 
attention because of its inherent adaptability to 
high-throughput settings and the promise of 
simplifying protein expression and processing 
workflows, its relatively higher costs and less pre- 
dictable results have imposed severe limitations in 
its adoption by the community. Before moving on 
to crystallization, we should say that there are 
many other useful and productive expression 
systems that can be even more efficient to produce 
certain proteins or protein complexes, but their 
more specialized nature confines them to a 
narrower range of applications. Amongst these 


127 


chromatography allows a complete separation between the 
heterocomplex and excess free components. (c) Diffrac- 
tion image of iC3b-CR3 al crystals. (Modified and 
reproduced with permission from [4]) 


specialized systems are yeasts, protozoans, algal 
and unconventional bacterial expression systems. 
The reader is directed to the excellent reviews 
available on these topics for further information. 


9.3 Crystallization and Crystal 


Handling 


Crystallization is a well-known bottleneck for 
crystallography. Part of the difficulty with crys- 
tallization can be explained by the low probability 
with which large, flexible macromolecular 
entities settle on a thermodynamically stable 3D 
arrangement without precipitating or unfolding. 
Since it was first described, macromolecular crys- 
tallization has been considered an art that requires 
extensive trial-and-error screening to succeed. 
The transition from an art to a more systematic 
science, which is clearly not yet complete, has 
been unfolding slowly but steadily. 

The process by which a macromolecule 
(or a complex) can be crystallized can be divided 
into nucleation and growth [7-9] (Fig. 9.2). 
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Protein concentration ———_____» 


Salt concentration ———————_> 


Fig. 9.2 Macromolecular crystallization. (a) Phase dia- 
gram of a protein crystallization. Both nucleation and 
crystal growth may occur in the labile (supersaturated) 
zone, while only crystal growth can occur in the 


Nucleation, the physicochemical process responsi- 
ble for producing crystal seeds from initially 
homogeneously dissolved macromolecules, is an 
intrinsically stochastic process. Although much is 
now known about nucleation and how to bias the 
formation of crystal nuclei, finding adequate 
conditions to start crystal formation remains a 
trial-and-error process requiring extensive, time- 
and sample-consuming experimentation. Growth, 
which can only begin in the presence of crystal 
nuclei, has distinct requirements from nucleation. 
Although many times nucleation and growth can 
be accomplished under a single set of crystalliza- 
tion conditions, decoupling nucleation from 
growth by using micro- or macroseeding 
techniques can be advantageous when obtaining 
large, well-diffracting crystals proves challenging. 

Nanocrystallization under regulated humidity 
conditions can be singled out as one of the trans- 
formative advances of crystallography [8, 10]. On 
the one hand, nanocrystallization allowed reduc- 
ing the amount of costly protein samples con- 
sumed by a factor of 5-10 by simply making 
technically feasible to miniaturize crystallization 
experiments from 1-2 pL to 100-400 nL 
(0.1-0.4 pL). On the other hand, the reduction 
in sample volume allows to screen a wider range 
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metastable (supersaturated) zone. (b) Diverse protein 
crystals displaying the range of habits they may assume. 
(Reproduced with permission from [9]) 


of crystallization conditions with a given amount 
of sample, hence increasing the chances of 
finding suitable crystallization conditions for 
scarce protein samples. 

The accumulation of knowledge on the effect 
of various families of precipitants on the crystal- 
lization of macromolecules has permitted the 
design of more powerful sparse matrix crystalli- 
zation screens, thus increasing the success of the 
crystallization experiments and the reduction in 
the actual chemical space that needs to be 
explored. Similarly, a greater understanding of 
the redundancies present in different crystalliza- 
tion screens has led to a more judicious selection 
of initial crystallization screens and an associated 
enhanced efficiency and economy on the use of 
valuable protein complex samples. 

Micro- and macroseeding are crystallization 
techniques that attempt to decouple nucleation 
from growth in such a way that conditions can 
be optimized for either process independently 
{11, 12]. In certain cases, microseeds obtained 
from low-quality crystals or even microcrystalline 
precipitate have been used in sparse-matrix 
crystallization screens with higher success 
rates than the original macromolecular solution. 
Micro- and macroseeding have been automated 
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using nanocrystallization robots. A particular 
application of microseeding using seeds obtained 
from a different but related macromolecule 
(heterogenous microseeding) has been successful 
for certain classes of macromolecules (e.g., 
antibodies), and offers the promise of aiding the 
crystallization of otherwise reluctant protein 
samples. 

Issues with cryoprotection of macromolecular 
crystals have also been tackled as optimization 
problems [13]. Screens of cryoprotectants have 
been devised, typically by mixing different 
families of cryoprotectants in osmotically bal- 
anced cocktails. 

Although not conventionally regarded as part 
of the crystallization process, crystal handling and 
freezing, when properly done, presuppose the 
application of knowledge gained from the 
macromolecule’s _crystallizability properties. 
Factors such as crystal shape, size, density, 
and solvent content, in addition to the nature 
and concentration of the various precipitants and 
additives present in the crystallization experi- 
ment, have to be carefully considered. Lately, 
the first automated systems for crystal snap freez- 
ing have appeared that combine robust cameras 
and image recognition and processing software 
with robotic arms capable of looping the crystal 
and plunging them into liquid nitrogen with sur- 
gical precision. 

With the advent of X-free electron laser 
(XFEL) sources (fourth generation synchrotron 
radiationSynchrotron radiation sources), it has 
become possible to image the structures of mac- 
romolecular complexes which only associate 
weakly in clusters of a few units, challenging 
the very naming of crystals for those species 
[14] (see Chap. 10 by A. Round et al.). 


9.4 X-Ray Diffraction Experiment 


and Data Processing 


X-ray diffraction (XRD) requires highly 
specialized synchrotron beamlines and sophisti- 
cated infrastructure, which are themselves 
adapted to the types of experiments that are 


129 


performed on each beamline. Planning and 
executing XRD experiments depend on many 
considerations and critical decisions must be 
taken regarding crystal shape and size, alignment, 
radiation dose tolerance, internal symmetry, pres- 
ence or absence of certain morphological defects 
(e.g., twinning, pseudosymmetry), and the pur- 
pose of the experiment itself (e.g., single- 
wavelength native dataset vs. multiple anomalous 
diffraction vs. sulfur phasing). The basic XRD 
experiment at a synchrotron source uses the rota- 
tion method of diffraction data collection, which 
has been used extensively and continues being the 
preferred method (Fig. 9.3). Excellent reviews 
have been written that cover some of these 
aspects and many others, and we will not review 
them again here. Our focus will be on specific 
features of the XRD experiments that affect the 
outcome of the experiments conducted on crystals 
grown of macromolecular complexes. 

To start with, crystals of macromolecular 
complexes tend to reflect the specific physico- 
chemical properties of multiprotein complexes 
when compared to single proteins: they are charac- 
teristically more fragile, contain higher solvent 
content and their crystal unit-cell dimensions are 
larger. Diffraction patterns therefore become less 
sharp, reach lower resolution, exhibit a greater 
variety and intensity of crystal defects, and, overall, 
they tend to be less tolerant to radiation damage. In 
addition, the greater size of protein complexes 
translates into patterns of diffraction spots that are 
much closer along certain directions and sparser in 
other directions. The nonrandom distribution of 
reflection spots, combined with the more blurred 
point-spread function for protein complex crystals, 
increases the difficulties associated with collecting 
quality diffraction patterns. 


9.4.1 Data Collection Strategies 

Innovative data collection strategies have played 
fundamental roles in the successful solution of 
crystals from macromolecular complexes with var- 
ious crystalline defects or pathologies. For exam- 
ple, mesh screening has allowed to find suitable 
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Fig.9.3 The rotation method of X-ray diffraction data 
collection. (a) Schematic representation of the rotation 
method in MX. The incoming X-ray beam impinges on 
the macromolecular crystal as it is rotated around an axis 
perpendicular to the X-ray beam; after going through the 
crystal, the beamstop blocks it. The diffracted X-rays 
continue their journey to the detector, where they are 
recorded. A cryoprotected crystal inside a small nylon 
loop and one diffraction image collected from it are 


crystal regions for diffraction in crystals with het- 
erogeneous diffraction properties or where ice 
or salt precipitates may interfere with protein 
diffraction. For needle-shaped crystals or 
crystals with weak diffraction patterns, another 
non-conventional yet popular strategy consists in 
overexposing small regions of a larger crystal in 
narrow overlapping angular wedges and scaling 
and merging the independent wedges into a single 
data set. Finally, helical data collection is a refine- 
ment of the previous strategy which has facilitated 
the acquisition of complete data sets for extremely 
long, often flexible crystals. These approaches 
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shown. (b) Most MX structures in the PDB have been 
solved using complete X-ray datasets acquired by the 
rotation method. Example MX structures are shown from 
left to right in cartoon representation and rainbow colors: 
endo-B-1,4-xylanase from Fusarium oxysporum (PDB ID 
SJRN) [15], xylanase Xys1A from Streptomyces halstedii 
(PDB ID 1NQ6) [16], GAPDH from Atopobium vaginae 
(PDB ID 5LDS) [17] 


have benefitted from the development of more 
reliable automatic data processing and merging 
procedures, which have played a central role in 
successfully processing challenging data sets. 


9.4.2 _Multi-crystal and Microcrystal 


Mounts 


Although diffracting single crystals is the stan- 
dard method, there are few theoretical reasons 
why the crystallographer should mount a single 
crystal on a loop for diffraction. First and 
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foremost, a single mounted crystal removes ambi- 
guity about the source of the diffraction spots 
and avoids the need for computer deconvolution 
of overlapping diffraction patterns. However, 
for screening of many microcrystals it may be 
more practical to mount several crystals on one 
loop and rely on synchrotron beamlines to iden- 
tify and diffract each crystal independently. 
Specialized mounts devised for accommodating 
several crystals in a row are under testing. In its 
most extreme form, direct diffraction of screen- 
ing crystallization plates [18] can be done at 
certain beamlines which have implemented the 
required hardware and software procedures. 
Successful in-plate diffraction requires auto- 
mated light-microscope imaging of the crys- 
tallization plate, software tools to mark 
crystallization drops and specific crystals for 
XRD, and robotic plate transfer. In the context 
of macromolecular complexes and membrane 
protein crystallography, multi-crystal and micro- 
crystal mounts could speed up the process of 
screening optimal conditions for the diffraction 
experiment. 


9.4.3 Multi-crystal Approaches 
and Serial X-Ray 


Crystallography 


Alongside conventional single-axis rotation dif- 
fraction, other diffraction methods have been 
adopted to collect XRD data optimally for spe- 
cific applications. For example, multi-crystal 
native SAD (MDS) and serial X-ray crystallogra- 
phy (SSX) exploit the fact that low-dose XRD 
spots can be collected from multiple (isomor- 
phous) crystals, processed, and merged to yield 
highly complete, highly redundant datasets. 
These strategies work better than more conven- 
tional approaches when crystals are very radiation 
sensitive or when the anomalous diffraction 
effects that need to be measured are close to the 
background intensity signal. Although they still 
rely on a rotating crystal, the angular wedge that 
must be sampled per individual crystal can be 
reduced. 
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9.4.4 X-Ray Free Electron Laser 


(XFEL) 


Taken to its extreme, XRD data can also be col- 
lected from X-ray sources other than synchrotrons 
and in-house diffractometers. For example, the 
highly energetic X-ray laser pulses created by 
X-ray free electron laser sources (XFEL) allow to 
collect diffraction data from crystallites 100-nm 
across [19] (see Chap. 10 by A. Round et al. for 
further information on XFEL). A single shot can be 
collected from each crystallite, since the large 
amount of energy absorbed by the diffracted crys- 
tallite destroys it a few femtoseconds later—the 
“diffraction before destruction” method. The inten- 
sity and destructive power of the XFEL laser pulse 
is incompatible with the rotating method. In con- 
trast, XFEL uses many stationary crystals in random 
orientations to compensate for the few reflections 
that are in reflecting condition while stationary. To 
collect a complete dataset, it is therefore necessary 
to direct the X-ray laser pulses toward a steady 
stream of crystallites flowing along a direction per- 
pendicular to the direction of the laser pulses. The 
data collection technique developed for this purpose 
is called serial femtosecond crystallography (SFX) 
[19]. Although still in its infancy, XFEL facilities 
and the SFX collection method show great promise 
for crystallographic structure determination of large 
complexes, since the requirement for large crystals, 
which is generally harder to realize for large 
multisubunit complexes than for single proteins, is 
partially relieved. In addition to static structural 
information, XFEL and related developments can 
potentially yield information on time-resolved 
processes. 


9.5 Phasing and Structure 


Determination 


As with X-ray data collection, the structure deter- 
mination methods for macromolecular crystallog- 
raphy have been the subject of rigorous reviews. 
For this chapter, we would like to emphasize that 
most protein complex structures require special 
treatment because of the number of subunits and 
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often the intricacies of protein-protein interfaces. 
Protein heterocomplex structures are rarely 
solved exclusively by ab initio or de novo 
methods unless the number of subunits is low 
(<3); homocomplexes are different in that the 
repetition of identical subunits allows for the 
application of powerful electron-density averag- 
ing methods capable of solving very large 
structures. Higher-order heterocomplexes typi- 
cally need additional information in the form of 
partial molecular replacement searches. 
Single-anomalous diffraction (SAD) using the 
weak anomalous signal extractable from light 
atoms by diffracting native crystals with long- 
wavelength X-rays is a powerful technique that 
does not require heavy-atom derivatization, nor 
does it introduce artefacts in the structure by virtue 
of covalent bonding to heavy metals. Long-wave- 
length SAD is epitomized by sulfur (S)-SAD, 
although in practice the anomalous scattering 
from other light atoms contribute to the overall 
intensity and measurability of the anomalous dif- 
fraction effects from native crystals. Besides S, 
atoms such as P, CI, Br, K* and Ca?* have 
measurable anomalous diffraction effects. For 
years, both the available hardware and analytic 
techniques have dictated that S-SAD phasing was 
only feasible for crystals of small proteins 
diffracting to very high resolution and, hence, for 
asymmetric units containing small, compact 
proteins. This consideration alone would suggest 
that S-SAD is not suited for the study of protein 
complexes, whose size and less ordered 
arrangements reduce the diffraction power of 
their crystals. However, recent breakthroughs in 
S-SAD phasing from the Hendrickson lab (the 
multi-crystal native SAD method) [20-22], the 
conquest of low-diffracting crystals (as low as 
4 A) [23], and the determination of moderately 
large protein complexes by S-SAD all argue other- 
wise. The possibility that native SAD is in practice 
not limited to strongly diffracting crystals and 
hence applicable to macromolecular complexes 
has been brilliantly demonstrated by the phasing 
of the 266-kDa T2R-TTL (af-tubulin, stathmin-4 
and tubulin-tyrosine ligase) complex [24] and the 
132-kDa histidine kinase TorT/TorSS complex 
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[25]. With the recent deployment of long-wave- 
length SAD dedicated synchrotron beamlines 
furnished with the most recent technological 
advancements aimed at boosting the signal-to- 
noise ratio from the weak anomalous signal from 
light atoms like S, P, and K*, we can be confident 
that many more crystal structures of macromolec- 
ular complexes will become solvable by these 
methods in the coming years [1, 26]. 


9.6 Macromolecular Refinement 
In MX, refinement refers to the process by which 
the parameters describing the macromolecular 
structure are iteratively updated to increase the 
match between the model and the experimental 
data. Parameters such as the spatial atomic 
positions (x, y, z) and the atomic displacement 
parameters (ADP) or B-factors are typically 
optimized during refinement. 

Refining large macromolecular complexes is a 
challenging task confounded by the sheer size, 
number of subunits, greater flexibility, and disor- 
der present in those structures. Dedicated refine- 
ment algorithms and software implementing them 
have been developed precisely to circumvent 
those limitations of conventional macromolecular 
refinement programs. 

Classically, refining multisubunit complexes 
has relied upon rigid-body refinement (to adjust 
starting positions) and noncrystallographic sym- 
metry (NCS) averaging across complete models 
or domains. In both cases, the strategies make use 
of reduced representations of the highly complex 
macromolecular models to improve the crystallo- 
graphic phases at the onset, when the mean phase 
error of the (incomplete) models is greater. In 
some cases, these strategies are sufficient to pro- 
duce electron density maps of sufficiently high 
quality to allow progressive manual building and 
refinement cycles until a complete model, 
with acceptable geometric and stereochemical 
parameters, finally emerges. In most cases, how- 
ever, significant portions of the macromolecular 
models (sometimes entire subunits) cannot be 
traced to any satisfactory degree. Deficiencies 
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in multisubunit models can also contain 
imperfections in the tracing of many side chains, 
secondary structures, and loop regions, in addi- 
tion to more discrete errors as pointed out before. 
To cope with the increased flexibility of com- 
plex macromolecular models, several algorithms 
have been proposed to treat macromolecules 
(or well-defined parts of them, like helices) as 
elastic entities that can be continuously bent or 
twisted to better fit the observed data. Elastic 
Network (EN) modeling represents an improve- 
ment for many-atom models because it finds 
shortcuts across complicated energy landscapes 
which would take ages to traverse for conven- 
tional algorithms. As rigid-body modeling and 
NCS averaging, EN modeling has the advantage 
of reducing the dimensionality of the macromo- 
lecular models while allowing a seamless transi- 
tion between the EN and atomic refinement. 
What’s the role of B-factor modeling for 
macromolecular complexes? Large complexes 
often show highly anisotropic distributions of 
B-factors, including significant differences in 
overall and atomic B-factors across subunits. To 
improve the treatment of these and other 
limitations of the default isotropic B-factor, TLS 
modeling was invented (TLS, or Translation/ 
Libration/Screw, a mathematical model that 
predicts the local positional displacement of 
atoms in a crystal structure). Macromolecular 
models can exhibit wild variations within and 
between domains, as well as between side and 
main-chain atoms. Part of this variation can be 
accurately modeled by, e.g., assuming a continu- 
ous variation of B factors along bonded atoms, 
which can be expressed as B-factor restraints. 


9.7 Model Building 

Macromolecular complexes pose additional 
challenges to be properly modeled from XRD 
data. The sheer amount of information that must 
be captured to duly describe the crystal structure 
of a multisubunit complex has spurred additions 
and expansions to the conventional PDB models. 
For example, the PDB file for the ribosome 
contains many additional chain labels absent 
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in most other PDB entries. Unconventional 
techniques for PDB annotation are required to 
describe the additional complexity present in 
large complexes. Another format has been devel- 
oped that removes all restrictions on the size of 
the structure: the macromolecular Crystallo- 
graphic Information File (nmCIF). The mmCIF 
is a well-structured format based on a dictionary, 
which includes definitions and descriptions about 
the structure and its experimental determina- 
tion. Besides the more complicated bookkeeping 
imposed by the size of macromolecular complex 
structures, other challenges for modeling of 
complexes have to do with the increased number 
of degrees of freedom and the comparatively 
lower data density of the diffraction patterns. 

Using the macromolecular models of 
complexes by other modeling techniques is also 
affected by the size and complexity of the models 
and requires imposing restraints and combining 
the modeling methods with experimental 
restraints to maintain reasonable trajectories dur- 
ing modeling. 

Rich restraint sets can be obtained from many 
biochemical, biophysical, and structural methods 
including nuclear magnetic resonance (NMR), 
small-angle X-ray scattering (SAXS), and mass 
spectrometry (MS). SAXS is well-suited as a 
companion technique to XRD, since it can pro- 
vide corroboration of the architecture and overall 
subunit arrangement of the complex as well as 
lending itself to the quick testing of structural 
hypotheses (see Chap. 11 by S. Hutin et al.). 


9.8 Structure Analysis 


and Interpretation 


The goal of determining the crystal structure 
of a macromolecular complex is to understand 
the biological processes the chosen complex 
participates in. To draw sound inferences about 
the function of a protein complex, the structure 
must represent a functional form of the complex, 
in the correct (physiological) stoichiometry. Oth- 
erwise, the inferences drawn might be wrong or 
be seriously misleading. For complexes, a central 
concern is how to define the biological assembly 
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starting from the array of experimentally 
observed crystallographic assemblies (i.e., those 
defined by crystal contacts regardless of their 
presence in solution) [27]. Since the precise solu- 
tion assembly further depends on the protein 
constructs used for crystallization and the 
presence of other molecules (e.g., allosteric 
modulators, binding partners), it is important to 
recognize the correct biological assembly. 
Properties that aid the recognition of the correct 
biological assembly include the stoichiometry, 
interface sequence and structural conservation, 
and symmetry. 

Often, the process of reconstructing biological 
function from crystal structures is stepwise and 
laborious and may need the use of concurrent 
evidence garnered from structural biology 
methods or from other disciplines (e.g., cell biol- 
ogy). In all cases, technical accuracy, and a deep 
understanding of the physicochemical properties 
of proteins and other macromolecular entities is 
essential. Wrongly building models that seem to 
agree with the experimental data but conflict with 
other pieces of evidence should indicate to the 
practitioners that something is seriously flawed 
about the models thus built. 

A limitation of many structural techniques is 
that they can only capture a snapshot of the 
macromolecules, not the entire biological process 
as it is supposed to unfold. Crystallography can 
describe individual lowest-energy conformations 
one at a time (or, rather, one per crystal). If the 
crystallographers were lucky, two lowest-energy 
conformations can be co-crystallized simulta- 
neously, although such cases are statistically 
improbable and thus are only very rarely found. 

Despite that limitation, critical information can 
be obtained from crystal structures of protein 
complexes. For example, the organization of 
protein-protein and protein-nucleic acid interfaces 
are very informative about the functions performed 
by the interacting partners. It also frequently 
suggests ways in which functionally important 
movements or dynamics could occur by showing 
the presence of hinges, glide axes or small 
molecule-binding pockets that may block or 
impair the interaction. The accuracy of crystallo- 
graphic models ensures that the conclusions drawn 
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about protein-protein and protein-nucleic acid 
interfaces have solid grounds. 

Especially for novices on MX who try to make 
sense of complex macromolecular models, it 
should be noted that the degree of uncertainty in 
atomic positions (and even of complete side 
chains and segments of structure) are determined 
by local and across-the-crystal disorder. 


9.9 Power of Two: Combining XRD 


with SAXS and Cryo-EM 


Typically, SAXS and XRD have been used 
jointly to produce combined interpretations of 
macromolecular complexes [reviewed in 
28]. Typically, SAXS provides pseudo- 
independent corroboration for unusual or highly 
novel features observed in XRD models and can 
be used to prove which one of several possible 
heteromeric arrangements corresponds to the 
physiological state (as observed in solution). At 
a minimum, agreement between the experimental 
solution scattering and the theoretical 1D scatter- 
ing calculated from various possible complexes 
observed in crystalline form can confirm the 
absence of gross structural artifacts. Beyond this 
confirmatory role for SAXS, incomplete XRD 
models due to, e.g., absence of interpretable elec- 
tron density in loop regions, SAXS data can also 
be used to reconstruct missing loops or larger 
parts of the model. SAXS data has also been 
used in the stepwise reconstruction of ever larger 
macromolecular complexes, when crystals of the 
complete complexes cannot be obtained while 
structures of the individual components or 
subcomplexes exist. Rigid and flexible fitting of 
subcomplexes into the SAXS data for complete 
complexes are powerful methods to discern the 
interrelationships linking the complex subunits. 
The one-to-one correspondence between a 3D 
structure and its theoretical 1D scattering curve 
ensures that XRD and SAXS can be easily and 
confidently combined to produce reliable 
interpretations of the structural complexes. 

XRD and cryo-EM are highly compatible 
techniques [29]. Before the resolution revolution 
undergone by cryo-EM [30, 31], modeling of 
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supramolecular complexes by cryo-EM had to 
rely on atomic models reconstructed by XRD. 
Docking of XRD models into cryo-EM maps, 
either as rigid bodies or after allowing for limited 
flexible modifications to the models, represents 
the most common approach to combine the two 
techniques. The docking approach can be viewed 
as the incorporation of high-resolution structural 
information (on specific components) on a larger 
complex solved at otherwise moderate resolution. 
The robustness of the Coulomb density maps 
built from low-resolution cryo-EM data and the 
precision of docking algorithms have both been 
confirmed by the verification of the results when 
higher-resolution cryo-EM structures became 
available. 

Since the advent of modern electron 
microscopes, direct detectors and the software 
advances that have made possible the collection 
of high-resolution images of macromolecules by 
cryo-EM, XRD and cryo-EM have assumed 
an even more complementary role [32] (see 
Chap. 12 by A. Deniaud et al. and Chap. 13 by 
J.L. Carrascosa). Software developments made 
for model building, refinement, and validation 
for XRD can now be adapted for cryo-EM. In 
fact, historically crystallography-exclusive soft- 
ware suites like CCP4 and PHENIX have forked 
out applications to extend their usefulness for 
cryo-EM models [33-35]. The increase in quality 
and resolution of cryo-EM maps has facilitated 
the use of such maps for the phasing and structure 
determination of XRD structures [36, 37]. This 
has proven critical in same cases where XRD 
amplitudes were measured to higher resolution 
than the cryo-EM images of the sample complex, 
but the crystal structure could not be phased by 
XRD alone. The phases measured by cryo-EM, 
lacking from the crystallographic data, have in 
some cases allowed the phasing and then the 
complete structure determination of very intricate 
complexes. A case in point is the crystal structure 
of the TLR13-ssRNA complex, which diffracted 
to 2.3 A resolution but was phased with a 4.8-A 
cryo-EM map of the same complex 
[38]. Another example is the T7 bacteriophage 
portal tridecamer (gp8-13mer) and the closed- 
conformation dodecamer (gp8ciosea) structures. 
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To phase the 3.4-A gp8-13mer XRD data, a par- 
tial model comprising 36% of the structure was 
first built ab initio using a 5.8-A-resolution cryo- 
EM map that was successfully used as a molecu- 
lar replacement model; phase extension from 5.8 
to 3.4 A allowed the tracing of the complete gp8- 
13mer structure, a monomer of which was then 
used to solve the 3.6-A EP8closea Crystal structure, 
again by molecular replacement [37, 39] 
(Fig. 9.4). 

When solving the structure of macromolecular 
biological assemblies, atomic model building is 
probably the step where cryo-EM probably has its 
tightest links to MX. Even though cryo-EM can 
nowadays produce maps of macromolecular 
biological complexes at resolution higher than 
3 A thus allowing de novo modelling of atomic 
structures, still the resolution and quality of the 
maps for many cryo-EM structures is not high 
enough to support the usage of auto-building 
software and therefore the building of atomic 
models relies on the availability of thousands of 
atomic models deposited in the PDB database. 
Nevertheless, the quality of the maps generated 
by cryo-EM is high enough to almost interpret the 
maps unambiguously. This achievement was 
supported by hardware development, the most 
important being the commercialization of direct 
electron detectors (also called DED) [reviewed in 
40]. Direct electron detectors record electrons 
directly and can assign each electron to a specific 
pixel. Before DED the detectors recorded photons 
emitted by the electrons that were hitting a scin- 
tillator placed on top of the detectors. DED can 
also collect movies, allowing the user to correct 
for beam and stage-induced movements of the 
samples. 

The resolution revolution was possible also 
thanks to an improvement in the hardware of the 
electron microscopes. In modern microscopes, 
the sample is enclosed into the vacuum system 
of the microscope and is handled by a mechanism 
called autoloader. Autoloaders allow the screen- 
ing of up to 12 samples in the same session and 
importantly increase the stability of the whole 
microscope. Also, microscope optics have 
improved in stability. Despite the improvements 
in the quality of the maps generated by the new 
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Fig. 9.4 Combining MX with cryo-EM to solve the 
crystal structure of T7 bacteriophage gp8-13mer. 
(a) Poly-Alanine model (36% of the structure) of gp8- 
13mer built into the 5.8-A cryo-EM map. This initial 
model was used as molecular-replacement (MR) search 
model against a 3.4-A XRD dataset. (b) Density 


generation microscopes, for most of the maps 
generated by cryo-EM de novo building has 
been strongly dependent on homology modelling 
made possible by software, such as I-TASSER 
[41] or AlphaFold [42], which rely on the avail- 
ability of atomic models already deposited in the 
PDB database that were mainly produced by MX 
experiments. In the past few years new tools have 
been developed and old tools have been adapted 
to the needs of cryo-EM. Namdinator is a web 
tool worth mentioning because it is straightfor- 
ward to use and very reliable at the same time 
[43]. Namdinator uses molecular dynamics to fit 
an atomic model into a cryo-EM map. It can be 
used from initial fitting to regularization of 
geometries to correct outliers, clashes, and other 
model problems. Macromolecular refinement and 
model building programs central to MX like 
PHENIX [35], REFMAC5 [44], and Coot [45] 
have been adapted to the needs of cryo-EM. 
More recently two groups reported cryo-EM 
maps that have a resolution better than 2.0 A 


modification dramatically improved the quality of the ini- 
tial electron density map obtained by MR using the poly- 
Ala model in (a) (top), yielding a readily interpretable map 
for further model building (bottom). (c) Final cartoon 
model of gp8-13mer (PDB ID 6TJP). (Reproduced with 
permission from [37]) 


showing that cryo-EM can now reach real 
“atomic resolution” and can resolve details at 
the level of hydrogens, paving the way for auto- 
matic de novo building also from cryo-EM maps. 
The Scheres laboratory at the LMB in Cambridge, 
UK, obtained a 1.7 A resolution cryo-EM recon- 
struction for a prototypical human membrane pro- 
tein, the B3 GABA, receptor homopentamer, 
using a cold field emission gun (cold FEG), a 
new electron source, energy filter, and a new 
prototype of the Falcon camera called Falcon IV 
[46]. When they used a protein that is considered 
a standard for cryo-EM studies (like lysozyme for 
MX) such as mouse apo-ferritin, they reached 
1.2 A resolution. The Stark laboratory at the 
Max Planck Institute for Biophysical Chemistry 
in Géttingen, Germany, reported a 1.25 A resolu- 
tion structure of apo-ferritin obtained by cryo-EM 
with a monochromator and a newly developed 
spherical aberration corrector [47]. 

The elevated purchasing and maintenance 
costs of high-end electron microscopes constitute 
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a bottleneck for the spreading of electron micros- 
copy in all the research centers that would need 
it. If we compare EM with XRD, what cryo-EM is 
missing is the equivalent of in-house X-ray 
sources and detectors to allow the screening and 
preliminary data collection of samples. Some 
progress has been made in this direction. For 
example, Russo and Henderson are working on 
a microscope that combines low cost and high 
performance and that would allow the capillary 
spreading of cryo-EM everywhere [48]. In their 
proof-of-concept set-up these authors used the 
widespread 100-keV electron microscope, an 
instrument that is already used in many institutes 
for screening cryo-EM grids and for negative 
staining projects. Henderson and Russo have 
shown that this low-end microscope can however 
provide maps at high resolution when equipped 
with an Eiger detector from Dectris and a FEG as 
electron source. Such a scope has purchasing and 
running costs that are at least an order of magni- 
tude lower than those of high-end microscopes. 
This is an outstanding achievement, because it 
paves the way for affordable cryo-EM accessible 
to almost any laboratory. Scientists could use 
these affordable scopes for preliminary data 
acquisition and even for solving most of their 
structures and dedicate the high-end scopes, for 
example, those present at synchrotrons, for 
improving resolution or for very challenging 
cases. This resembles the approach used by MX, 
where most of the work is done in-house and the 
synchrotron radiation is used for improvement of 
the resolution and quality of the data. 

It is important to mention that electron diffrac- 
tion (ED) is gaining more and more visibility in 
the field of structural biology. The group of Tamir 
Gonen at UCLA collected ED data from small 3D 
protein crystals and even extended the technique 
to “invisible” crystals, as small as 300-800 nm 
across [49]. Recently, the same group collected 
ED data from canonical lysozyme crystals. They 
thinned the crystals with a Focused Ion Beam 
milling apparatus. Then, they collected high- 
resolution diffraction data by using a 
low-intensity beam coupled with a new genera- 
tion Falcon 4 detector in the modality that ensures 
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counting of single diffraction events. The combi- 
nation of all the techniques generated such high- 
quality data that the authors could solve the 
phases with ab initio methods starting from a 
short peptide with as few as three alanine residues 
for lysozyme and a 14-residue a-helical fragment 
in the case of proteinase K crystals [50]. 

Excitingly, new techniques are being devel- 
oped to acquire ED data. Several groups were 
able to set up protocols that allow the continuous 
(or serial) acquisition of data. Serial/continuous 
MicroED has rapidly developed into a highly 
promising field [51], with distinct advantages of 
MicroED over conventional XRD like less sam- 
ple consumption, smaller crystals, and the reutili- 
zation of XRD software (sometimes with some 
modifications) to perform data processing and 
structure determination. By filling the gaps 
between XRD and cryo-EM, MicroED has the 
potential to become a game changer in the field 
of structural biology. 

The software for cryo-EM data processing has 
also improved over the years. It is not the aim of 
this chapter to list all the new tools and software 
available, but it is worth mentioning that Dimitry 
Tegunov, together with Patrik Cramer’s and Julia 
Mahamid’s groups, has developed a new software 
called M, which allowed them to solve in situ 
(i.e., directly in the cell) the structure of 
ribosomes at 3.7 A resolution. This is a milestone 
in the field because it proves that cryo-electron 
tomography can reach resolutions very close to 
the ones reached by single particle analysis. 
It also suggests that the available technology 
is suited for high-resolution structure investiga- 
tion of complexes even in vivo, where the 
bottlenecks are sample preparation and data 
processing [52]. 

Many other experimental as well as theoretical 
approaches have been successfully combined 
with XRD, including NMR, MS, and others. Fur- 
thermore, theoretical approaches like molecular 
dynamics and quantum mechanical calculations 
require highly detailed and accurate crystal 
structures as starting points to ensure trajectory 
stability and a reasonable probability of finding 
the correct local minima during calculation. 
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9.10 Conclusions 


e MX is a powerful and relevant technique for 
structure determination of large multi-subunit 
protein complexes. 

e MX can be combined with other techniques, 
including cryo-EM. 

e Derive static as well as dynamical information, 
and even time-resolved information. 
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Abstract 


The advent of X-ray Free Electron Lasers 
(XFELs) has ushered in a transformative era 
in the field of structural biology, materials sci- 
ence, and ultrafast physics. These state-of-the- 
art facilities generate ultra-bright, femtosecond- 
long X-ray pulses, allowing researchers to 
delve into the structure and dynamics of molec- 
ular systems with unprecedented temporal and 
spatial resolutions. The unique properties of 
XFEL pulses have opened new avenues for 
scientific exploration that were previously con- 
sidered unattainable. One of the most notable 
applications of XFELs is in structural biology. 
Traditional X-ray crystallography, while instru- 
mental in determining the structures of count- 
less biomolecules, often requires large, high- 
quality crystals and may not capture highly 
transient states of proteins. XFELs, with their 
ability to produce diffraction patterns from 
nanocrystals or even single particles, have 
provided solutions to these challenges. XFEL 
has expanded the toolbox of structural 
biologists by enabling structural determination 
approaches such as Single Particle Imaging 
(SPI) and Serial X-ray Crystallography (SFX). 
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Despite their remarkable capabilities, the jour- 
ney of XFELs is still in its nascent stages, with 
ongoing advancements aimed at improving 
their coherence, pulse duration, and wavelength 
tunability. 


Keywords 


Structural biology - X-free electron laser 
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10.1 Introduction 

The advances of X-ray sources and their increas- 
ing brilliance (rotating anodes through 
generations of synchrotrons to X-ray free-elec- 
tron lasers) has brought new insight to biology 
and is yet to surpass the needs of the life science 
community. Each advance in X-ray source, 
instrumentation, automation, data analysis, etc. 
leads to an increased understanding of the way 
the natural world operates at the molecular level, 
to more structures solved and yielding higher 
resolution in different states or even the interme- 
diate steps through a complex reaction. The ulti- 
mate aim is to visualize and document molecular 
processes, observing individual atoms at a short 
enough time scale to follow the transfer of 
charges from the initiation of a reaction to its 
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Fig. 10.1 An XFEL experimental station at the 
European X-Ray Free-Electron Laser Facility. The 
experiment SPB/SFX station investigates crystalline and 
non-crystalline matter, with a particular emphasis on the 
determination of 3D structures of biological objects. 


completion. The desire to make “molecular 
movies” demonstrating the processes vital to life 
and abundant in nature offers the possibility for us 
to harness these tools and put them to use for a 
wide variety of applications, not only for the 
understanding and treatment of disease [1], but 
also for other disciplines such as harnessing pho- 
tosynthesis for safe, cheap and renewable energy 
production [2, 3]. 

The advent of X-ray free-electron Laser 
(XFEL) sources continues the important role of 
ever brighter X-ray sources in these fields 
(Fig. 10.1). The major benefits of XFELs to struc- 
tural biology fall into two broad experimental 
classes—-serial crystallography and single particle 
imaging. 


10.2 Properties and Benefits of XFEL 
Sources 


The main distinguishing features of XFEL 
sources over synchrotrons is the pulsed nature of 
XFELs giving extremely brilliant ultrashort 
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Examples include crystals of macromolecules and macro- 
molecular complexes as well as viruses, organelles, and 
cells. (Reproduced with permission. © European XFEL/ 
Jan Hosan) 


(femtosecond, 107"? s) pulses of X-rays. The 
intensity of each pulse from an XFEL provides 
enough scattering to record a useful diffraction 
pattern from single, sub-micrometer sized protein 
crystals. 


10.2.1 Time Resolution 

This pulsed nature of an XFEL source gives the 
opportunity to see femtosecond snapshots of 
active processes with variable time delay from 
the initiation of the reaction, which can be trig- 
gered in a variety of ways (mixing, pump-probe, 
temperature, or pressure jumps) (Fig. 10.2). Mea- 
suring full data sets at each time delay with fem- 
tosecond resolution gives the opportunity to 
visualize “molecular” movies at atomic resolu- 
tion, meaning that the process can be observed 
through each stage of multi-step reactions 
(Fig. 10.2). This provides significantly more 
understanding of the processes than can be 
achieved by capturing only the longer-lived tran- 
sition states and interpreting the changes between 


10 XFEL for Macromolecular Complexes 


Pulse 1 Pulse2 Pulse3 Pulse4 Pulse 5 


oi 


i 


; | f 


(147 ns) (1027 ns) (1907 ns) (2787 ns) (3667 ns) 


Crystal suspension 
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European XFEL pulse structure 


Fig. 10.2 XFEL time resolution enables megahertz 
data collection. Pulses from the European XFEL can be 
focused on protein crystals in crystallization solution as 
they are introduced into the focused XFEL beam using a 
liquid jet moving at speeds between 50 and 100 m/s. 


them, giving the possibility to observe and under- 
stand the fastest reactions without dependence on 
trapping intermediates. This provides researchers 
the ability to monitor and gain new understanding 
of processes vital for life and the fundamental 
nature of the biological systems. 


10.2.2 Signal From Smaller Crystals 


As XFELs produce somewhere on the order of 
10!” hard X-ray photons per pulse, crystals can be 
used that are much smaller than required for tra- 
ditional macromolecular crystallography (MX) at 
synchrotrons. This relieves the burden of produc- 
ing “large” crystals for structure determination— 
a bottleneck in many cases. Serial femtosecond 
crystallography (SFX) at an XFEL source has 
been performed for crystals as small as 500 nm°. 
The challenge, of course, is then to produce and 
deliver these small crystals in high enough 
numbers for SFX structure determination, which 


3D-printed 
nozzle 
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R AGIPD detector 


AGIPD can measure up to: 
352 frames at 1.1 MHz 


3520 frames per second 
900 ns pe 


=>i i (1.1 MHz) This experiment: 


15 and 30 frames at 1.1 MHz 
150 and 300 frames per second 


Diffraction patterns recorded from the sample are 
measured using an AGIPD detector, which can measure 
up to 3520 pulses per second at megahertz frame rates. 
(Figure reproduced with permission from Ref. [4]) 


requires a shift in the sample optimization 
strategies, addressed later in this chapter. 


10.2.3 Possibility to Follow 
Irreversible Time-Resolved 
Processes 


Synchrotron crystallography can efficiently fol- 
low reversible processes by integrating the signal 
at a given time delay of a process and repeating 
the excitation and measurement process on the 
same sample. In contrast, irreversible processes 
require fresh sample for each measurement, dra- 
matically increasing sample consumption as a 
function of the signal and measurement time 
required. Given that single frames of diffraction 
collected from individual crystals are often inter- 
pretable in XFEL serial crystallography, the pos- 
sibility to follow time-dependent processes that 
are irreversible becomes not only possible but 
more sample efficient with XFEL studies than 
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with synchrotrons. This is relevant to so-called 
pump-probe processes, which are processes 
started by an optical “pump” (typically an optical 
laser pulse), but also increasingly to mixing pro- 
cesses which are inherently irreversible and also 
require small crystals for rapid mixing times [5]. 


10.3 Effects of XFEL Radiation 
on Biological Samples 


Radiation damage of samples has been a long and 
deeply studied topic since the beginning of X-ray 
experiments themselves. Strategies to minimize the 
adverse effects and observe unaltered data from 
sensitive biological samples have pushed crystallo- 
graphic research from room temperature to cryo- 
conditions. The radiation dose limit of traditional 
room temperature data collection is approximately 
0.2 MGy, whereas under cryo-cooled conditions 
samples may withstand two orders of magnitude 
greater dose (30 MGy) [6-8]. However, it has been 
reported that this dose limit for room temperature 
experiments is dose rate dependent [9] and can be 
dramatically increased (up to 150 MGy) when 
using the highly intense femtosecond pulses at 
XFEL facilities [10-13]. 

As shown by the recent resurgence of interest 
in room temperature data collection, especially in 
biological research, the active (uncooled) forms 
are most interesting—especially for health-related 
research in particular drug design. Such room 
temperature structural studies are therefore ide- 
ally suited for XFEL based experiments, which 
naturally do not use cryo-cooling. The ability to 
access unaltered states for sensitive structures is 
key to understanding their function. This strategy 
is enabled by XFELs. The ultrashort pulses give 
rise to diffraction before secondary changes to the 
structure can take place, with the data collected 
“outrunning” damage to the sample, even though 
it may be destroyed by the probing pulse 
(Fig. 10.3). The effects of XFEL radiation (dose 
rate and exposure time) on biological samples is 
an ongoing and highly active area of research 
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[14-16] and determining the effects of XFEL 
radiation on samples has, until recently, only 
been possible during data analysis following an 
experiment. However, prediction tools for the 
effects of XFEL radiation on biological samples 
[17] are beginning to be made available to aid 
study of radiation effects in XFEL experiments 
with the aim to support feasibility and the justifi- 
cation for XFEL experiments in an experiment 
proposal. 

The very short duration of XFEL pulses allows 
the collection of diffraction data from samples 
that, if exposed to a longer duration of X-ray 
illumination, may change structure in the beam. 
This is particularly valuable when determining 
the structure of molecules containing a metal 
center(s), which are particularly sensitive to radi- 
ation damage from X-ray beams [18]. Photoactive 
systems may also show alterations to their struc- 
ture at very low (0.06 MGy) dose [19] which is 
not necessarily “damage” but effects on the active 
site by the X-rays used to probe them. 


10.4 Use of XFELs for Life Science 
Research 


Current State of the Art 
and “Standard” Experimental 
Options 


10.4.1 


XFEL experiments are continuing to develop. 
The capabilities available along with improved 
ease of use aim to facilitate access to a generation 
of users new to the field. Standardization and 
automation are central to this, and relatively 
“standard” XFEL experiments in structural biol- 
ogy are nowadays performed with high reliability. 
Users of XFEL facilities are, therefore, strongly 
recommended to familiarize themselves with the 
standard options offered (available on the facility 
webpages) as well as how the instrumentation 
available has been used successfully by the com- 
munity (lists of publications are generally avail- 
able on the instrument webpages). A brief 
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Fig. 10.3 Radiation 
damage in XFEL sources. 
Artistic visualization of a 
serial crystallography 
experiment showing a 
stream of crystalline 
proteins being hit by an 
optical laser before being 
struck by the X-ray laser. 
The microcrystals are 
destroyed by radiation 
damage, but the 
information about 
arrangement of the atoms in 
the protein is recorded to 
reconstruct a model of the 
structure of the protein. 
(Reproduced with 
permission. © European 
XFEL/Blue Clay Studios) 


overview of the possibilities offered at XFELs by 
technique is provided below. 


10.4.2 Crystallography 


The acquisition of atomic resolution information 
of fundamental biological processes has been a 
significant motivation for developments for life 
science experiments at XFELs. Structure determi- 
nation using X-ray FEL sources with small micro- 
and nanometer crystals of biomolecules [10, 11, 
20-23] has broadened of the scope of crystallog- 
raphy for biological structure determination 
[24]. Building on previous experience of the crys- 
tallographic community, data collection strategies 
to combine partial data sets from a few crystals 
are now scaled up to many thousands in the case 
of XFELs. This is a necessity at XFELs, as due to 
the power of each pulse the sample cannot survive 
and therefore needs to be replaced by fresh sam- 
ple for the subsequent pulse (serial crystallogra- 
phy) requiring fast injection speeds for MHz 
repetition rate experiments [4, 25]. 

In comparison to synchrotrons, SFX at XFELs 
address samples that (1) do not form large enough 
crystals to provide an adequate signal-to-noise 
ratio [21], (2) contain metal atoms that may 
be easily altered chemically by longer exposure 


145 


time (and hence not reflect the native structure 
of the sample) [26], or (3) time-resolved systems, 
where either femtosecond time resolution is 
needed or, for example, irreversible reactions 
such as mixing are to be studied, requiring small 
crystals to minimize the mixing time and define 
a clear fg as the start of the reaction [5, 27]. 

Time-resolved serial femtosecond crystallogra- 
phy, where possible especially at MHz data collec- 
tion rates and using the initiation of reactions 
via a laser or mixing, provides access to observe 
reactions and record “molecular movies” with the 
ability to observe and understand biochemical 
reactions involving unstable and short-lived 
intermediates which cannot be seen via other 
means. 


10.4.3 Single Particle (Coherent 
Diffraction) Imaging 


One of the fundamental limitations of crystallog- 
raphy has always been the need for crystals. 
Unfortunately, crystallization is not trivial for all 
samples, and many are either not yet crystalized 
or simply not amenable to crystallization. The 
need to obtain information on biological pro- 
cesses where the components cannot be (or at 
least have not yet) crystallized has driven 
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developments at XFELs [28] as it has with other 
methods (EM, NMR, SAXS). 

The potential for Single Particle Imaging (SPI) 
of biological macromolecules using the extremely 
intense and ultrashort pulses from an X-ray Free 
Electron Laser was introduced to the broad scien- 
tific community in a landmark study [29], about a 
decade before the first hard-X-ray FEL came into 
operation. The prospects for FEL-based SPI 
(near-atomic 3D resolution, without the need 
for crystallization, using single biomolecules in 
aqueous solution) highlight the motivation to 
build FELs and the associated infrastructure 
such as dedicated beamlines [30]. The ultrabright, 
ultrashort XFEL pulses potentially allow enough 
diffracted signal to be collected from individual 
particles. As with SFX many frames can be com- 
bined to yield a three-dimensional diffraction pat- 
tern that can be interpreted (See Data reduction 
and analysis: SPI) to yield the three-dimensional 
electron density of the molecule in question. No 
other class of X-ray source can attempt such a 
measurement approaching resolutions relevant 
for biomolecules, and even at XFELs this method 
is still a work in progress with ever improving 
resolutions and smaller and smaller samples 
probed in each measurement [31-33]. 

Usually, the particles are injected as a focused 
stream of aerosolized (See Choice of sample 
injection: Aerosol), randomly oriented, reproduc- 
ible or nearly reproducible entities. If then an FEL 
pulse coincides in space and time with a particle 
in the X-ray focal plane, scattered radiation in 
the optical far field can be collected on a 
two-dimensional detector downstream of the 
interaction region. 

A community-wide Single Particle Imaging 
Initiative is striving towards the ultimate goal of 
a 3-A-resolution structure [34] with a systematic 
approach to advance FEL-based SPI on smaller 
particles. A limiting factor towards this goal is the 
relatively small number of elastically scattered 
photons resulting from the interaction of a 
biological macromolecule with even a highly 
intensive FEL pulse [35, 36]: It becomes an 
increasingly challenging task to distinguish the 
resulting very photon-sparse diffraction patterns 
[37], the so-called “hits”, from the majority of 
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diffraction patterns where there was no particle 
(See Hit finding - Identification of frames with 
usable diffraction data). With a so-called “hit 
rate” between 0.1% and 1% [37] the advent of 
new sources, such as the European XFEL and 
LCLS-II, with MHz repetition rates, now brings 
the acquisition of complete SPI datasets within 
easier reach [38, 39]. 


10.4.4 Scattering Techniques 


Measuring more than a single particle can be used 
to increase the scattering statistics and is usually 
performed in liquid jets rather than aerosol (as for 
SPI). Fluctuation X-ray scattering (FXS) [40, 41] 
provides size and shape data which takes advan- 
tage of information from variation around azi- 
muth (which in traditional solution scattering is 
uniform due to spherical averaging) and therefore 
aims to provide higher resolution reconstructions. 
While solution scattering experiments have been 
undertaken at FEL sources these have untill 
recently been limited to difference experiments, 
comparing ground and activated states [42— 
44]. However, recent efforts have provided 
proof of principle of form factor recovery from 
traditional solution scattering data collected using 
a FEL source [45] despite the requirements for 
stable background subtraction with the 
complications of shot-to-shot variation from 
SASE sources which has previously complicated 
these efforts. 


10.4.5 Complementarity 
with Synchrotron Sources 


XFEL sources are not alone in the field of life 
science research and synchrotron experiments are 
highly complementary to the understanding of 
biological processes. Research time at any large- 
scale facility is limited and faces fierce competi- 
tion during the application process. This is espe- 
cially true for XFEL sources due to there being 
fewer XFEL sources than synchrotrons as well as 
XFEL sources having fewer instruments which, 
due to the linear nature of these facilities, 
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typically cannot all operate at the same time. 
Although efficient use of XFEL sources using 
multiplexing is a high priority for XFEL facilities 
to maximize the time available at their 
instruments, they still cannot offer as much 
experimental time as synchrotrons which are 
both more numerous and each having many 
more instruments than any XFEL. 

Given the limitation for XFEL experiments it 
is required (in order to be successful) in the 
application for beamtime to show the feasibility 
of the experiment. This feasibility not only covers 
the concept for the data collection plus the 
samples to be measured. It must also (ideally) be 
demonstrated that samples are expected to diffract 
well enough to achieve the necessary resolution 
to interpret the subtle differences in the structure. 
This necessitates that a proposal should include 
preliminary data collected at a synchrotron 
source. This should not be seen as the synchrotron 
facilities being a filter and a route to the XFEL. 
Instead, synchrotron sources are truly comple- 
mentary and should be used to learn as much 
about the sample as possible. For example, 
synchrotrons are suitable sources for longer time 
scale serial crystallography (with sufficiently 
sized crystals) and as the understanding of the 
systems under study progresses, the XFEL can 
be used to complement the existing data - for 
example observing shorter timescale behavior or 
smaller crystals to acceptable resolution. Optimi- 
zation of an experiment by combining data col- 
lection using a synchrotron, especially with the 
advent of the ongoing upgrades to increase the 
brilliance of synchrotron sources worldwide, and 
an XFEL where appropriate is advised to maxi- 
mize the chance of success for experiments. 


10.5 How to Prepare an XFEL 
Experiment 


XFEL instruments are typically, by design, more 
flexible owing to the wider experimental scope 
necessitated by their limited number. As such 
experimental setups at XFELs often require 
more preparation compared to, for example, 
optimized experimental stations for MX at 
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synchrotrons. The experimental setup required is 
dependent on the sample (delivery method) and 
data to be collected (instrument configuration, 
camera length, X-ray energy). Although there is 
scope for reducing setup time by scheduling sim- 
ilar experiments sequentially, in many cases opti- 
mization is still needed for the three essential 
requirements, beam delivery, detection and 
sample delivery. In the case of time resolved 
experiments the activation/triggering method 
(laser/mixing) must also be set up. To be success- 
ful in the highly competitive XFEL application 
systems one must not only submit a proposal with 
a highly motivating science case but demonstrate 
the feasibility of all three key points as well as any 
activation method (mixing or pump-probe) if 
required. This means in practice one should ide- 
ally show preliminary data and demonstrate that 
the samples can be prepared in the quality and 
quantity required (see below) as well as explain 
why an XFEL is needed instead or as well as 
synchrotron (or other) experiments. Modeling 
and simulation packages are also available to 
aid the experimental design and optimization 
facilitating the extension of the preliminary data 
to show what should be practically observable at 
an XFEL. These simulations and modeling tools 
become even more important for the more 
cutting-edge proposals which are extending the 
capabilities of the instruments. 

Choices for the beam delivery are highly 
dependent on the facility capabilities, X-ray 
energy, focus size, power (energy and pulse dura- 
tion) per pulse and sometimes even repetition rate 
of delivered pulses. Detection is a choice of the 
kind of detector and most facilities have a very 
limited choice of detectors. A key parameter for a 
user is the sample to detector distance, usually as 
short as possible for SFX and as long as necessary 
to sample non-crystalline diffraction appropri- 
ately for coherent diffractive imaging (CDI) 
[46]. Additional detection capabilities for com- 
plementary measurements can be highly valuable 
to give additional insights into the sample, for 
example, X-ray emission spectrometry for detec- 
tion of the oxidation states in time-resolved stud- 
ies [47]. The capabilities and scope of each 
facility and instrument are documented on their 
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respective websites and publications of similar 
research at these facilities can be highly valuable 
to the design and optimization of your 
experiments. Further to this, early inclusion of 
the instrument scientists in the design and pro- 
posal stage of a beamtime application is highly 
encouraged to facilitate the application and the 
experiment. XFELs are relatively novel facilities, 
and the rate of improvement in instrumentation 
and capability is very rapid in most cases. Discus- 
sion with the local experts will enable one to 
make the best use of each facility's capabilities 
in any given user experiment. 


10.6 XFEL Experiment Simulation 


Simulations of X-ray scattering have become an 
important part of experimental X-ray research. 
Simulations can be used to explore novel param- 
eter regimes and identify promising areas for 
experimental research where new phenomena 
may occur. A more practical aspect of simulations 
is that they often play a pivotal role during the 
design, execution, and analysis of an experiment. 
Recent advances in the development of theoreti- 
cal methods, and simulation codes, as well as in 
high performance computing (both software and 
hardware), have enabled simulations of complete 
experiments, describing the photon source, X-ray 
optics, interaction with and scattering from the 
sample, as well as X-ray registration in an X-ray 
detector. Such start-to-end simulations allow one 
to simulate realistic experimental data that take 
into account various imperfections, e.g., in the 
source’s pointing stability, temporal and spatial 
structure, optical elements like mirror height and 
slope errors, artefacts introduced by lenses, radia- 
tion damage of the sample during irradiation with 
the probing X-ray beam, as well as detector noise 
[1, 36, 48]. All these effects and their impact on 
the measured data can be studied in isolation or in 
combination with each other. The simulation suite 
SIMEX [36] provides a platform for start-to-end 
simulations for a broad range of X-ray based 
experiments, including single-particle imaging, 
SAXS/WAXS, absorption spectroscopy, and 
inelastic X-ray scattering. It bundles simulation 
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codes for the propagation of X-rays from the 
source to the interaction point, photon-matter 
interaction, and detection of scattered or transmit- 
ted radiation. For detectors at the European 
XFEL, these simulations also include the actual 
detector calibration constants. The SIMEX simu- 
lation environment allows X-ray facility users to 
predict observable data under realistic, experi- 
mental conditions, evaluate how the data and 
data characteristics (e.g., signal-to-noise ratio, 
hit rates) scale with machine parameters such as 
the pulse duration, bandwidth, or wavefront 
profile. This information could then be used 
to optimize the experimental configuration for 
maximized data quality. The simulations can be 
run using simplified models for quick qualitative 
estimates, e.g., online during an experiment or 
using more sophisticated ab initio methods for 
accurate predictions and data analysis. Another 
potential future application of SIMEX is its use 
during evaluation of beamtime proposals, i.e., 
reviewers can use the simulations to assess 
feasibility and potential impact of proposed 
experiments. In view of the very limited number 
of XFEL sources and the high costs for operating 
these facilities and providing beamtime to users, 
use of simulation and modelling tools become 
crucial to make the most efficient use of the 
precious and scarce beamtime. Demonstrating 
the likeliness of an experiment’s success through 
Start-to-end simulations would therefore be 
highly beneficial to beamtime proposals espe- 
cially where preliminary data cannot be measured 
experimentally without an XFEL. 


10.7 Sample Characteristics 
10.7.1 Crystal Size and Estimation 
of Amounts Needed 


The tendency for XFEL serial crystallography 
experiments is towards smaller crystals in the 
order of micron or even nano sized crystals, as 
the smaller volume is desirable not only for diffu- 
sion of ligands and substrates but also for photo 
excitation so that the entire crystal is excited 
rather than only the surface (due to limited 
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absorption length). In addition, a small crystal 
size is better suited for injection to minimize the 
risk of clogging and diffraction quality may be 
better in smaller single crystals, as there may be 
fewer dislocations and defects disrupting the crys- 
tal lattice. Crystal concentration is desirable to be 
as high as possible (40% by volume of crystal to 
solution) to achieve high data hit rates. However, 
the higher the concentration the greater the possi- 
bility of problems with settling and clogging, 
which interrupts sample delivery. This can be 
highly detrimental for reliable high throughput 
data collection. The sample volumes required 
will depend on the flow rate needed for stability, 
jet replenishment in MHz experiments or to 
achieve the desired time points for a mixing 
experiment. It is therefore not straightforward to 
give a single estimate for material required which 
would be applicable for all cases, though an 
overly simplified estimate would be that 
200-300 ul of concentrated crystal solution can 
provide a single data set. For these reasons, the 
sample delivery method (Choice of Sample injec- 
tion options) should be tested in advance in col- 
laboration with the facility staff to determine the 
optimal sample conditions and amounts required 
for any given experiment. 


10.7.2 Single Particle Sizes (Easy 
to Challenging) and Estimation 
of Amount Needed 


Complete 3D reconstructions of single particles 
have been obtained for viruses in a size range 
between several hundred nanometer down to 
70 nm diameter [49], in the latter case to a full- 
period resolution of around 10 nm [50]. Single 
hits from smaller viruses (diameter: 40 nm) have 
been obtained using hard X-rays (5.5 keV) [50], 
with signal above background down to about 
4 nm resolution. At higher photon energy 
(7 keV), signal above background was obtained 
from 70-nm virus particles down to 5.9 A resolu- 
tion, indicating the potential for higher-resolution 
structures than obtained so far, using FEL-based 
SPI. 

Particles have been injected at concentrations 
between 10'! and 10!” particles per ml with a gas 
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dynamic virtual nozzle (GDVN) [51-53] using a 
flow rate between 1 and 3 l/min. Given that a 
total dataset can require tens of hours of data 
collection time, a few milliliters of sample solu- 
tion are sufficient for a beamtime of several days. 
Using a GDVN in electroflow focusing mode, a 
higher concentration (10'% particles per ml) and 
lower flow rate (0.3—1 l/min) has been used 
[37]. The sample’s size distribution and homoge- 
neity should be checked independently prior to 
the experiment [52, 53]. 


10.7.3 On-Site Sample Preparation 
and Biophysical 
Characterization 


Current trends in life science research require 
working with biological samples that are increas- 
ingly more complex. This is even more critical for 
samples intended for study using XFEL sources as 
they are by virtue of needing such a source 
expected to be functional with strong measurable 
activity to enable time resolved experiments. 
Furthermore, as the activity is often light or oxy- 
gen induced, they can be highly sensitive and 
therefore unstable in terms of time scales and 
temperatures expected during production, 
shipping or final preparation. Large scale research 
infrastructures (synchrotrons and XFELs) with 
applications in structural biology often include 
sample preparation laboratories “on-site”, which 
are essential to provide the highest quality active 
and or functional samples and hence give the best 
possible chance to observe the sensitive biochem- 
istry. In addition, these facilities must also provide 
adequate complementary sample characterization 
to enable quality control and sample prioritization 
for any given measurement with beam. This addi- 
tional information not only aids making informed 
decisions during the data collection but also 
assists and complements data analysis as well as 
supporting the resulting conclusions. 

All large-scale X-ray facilities worldwide with 
an interest in life sciences provide some form of 
access to biological laboratories as part of access 
to the facility for experiments. The scope of the 
provided facilities is related to their individual 
research portfolio which is continually being 
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improved. Each facility maintains up to date 
details on their websites regarding the current 
techniques and equipment available. Careful 
preparation during the application process is 
advised to understand what additional facilities 
offered will be of benefit to the proposed experi- 
ment and for that to be included in the beamtime 
application. It is therefore suggested to contact 
facility staff for sample preparation and charac- 
terization during the early stages of developing 
ideas for experiments. 


10.7.4 Choice of Sample Injection 
Options 


The choice of sample delivery is complex as it is 
highly dependent on the sample to be studied. 
General guidelines and examples are given here, 
however, we again recommend you speak with 
the instrument scientists and staff responsible for 
sample delivery at the facility where you intend to 
undertake your experiment to find the best solu- 
tion for your particular sample(s). For most 
XFELs, the prevalent methods are serial based 
as the FEL power is not conducive to multiple 
measurements at the sample position (See above). 

FEL experiments employ three main classes of 
sample delivery: (1) liquid jets for delivering 
(primarily) small crystals to the X-ray FEL 
beam for serial crystallography (SX, SFX), 
(2) focused aerosol beams for delivering (primar- 
ily) non-crystalline particles to the X-ray FEL 
beam for single particle imaging (SPD, and 
(3) samples arranged on fixed targets, which 
may be crystalline or non-crystalline. 


10.7.5 Liquid Jets 


For sample delivery into vacuum, jet producing 
nozzles are mounted at the end of a hollow rod 
containing the liquid and gas lines for sample 
delivery and jetting control. This assembly is 
inserted via a load lock to position the nozzle 
above the X-ray focus position allowing 
exchange without the need to vent the entire 
chamber. The nozzle rod mates to a catcher 
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centered around the interaction region. The 
catcher restricts the majority of sample residue 
to a small easy-to-clean volume and provides a 
degree of differential pumping of the liquid and 
gas loads to ensure the pressure of the main inter- 
action chamber (and detector) stays below the 
required threshold for safe operation [54]. 

The most frequently used methods to establish 
jets are electrospinning [55] and gas dynamic 
virtual nozzles (GDVNs) [56]. Electrospinning 
uses an electric field to accelerate the liquid 
whereas a GDVNs use gas (typically He) to 
apply pressure to a liquid sample jet emerging 
from a typically 50-100 um capillary which 
constricts the flow to a few pm in diameter 
accelerating the liquid. Both produce a very thin 
and fast jet capable of delivering crystals in sus- 
pension to the interaction region at an optimized 
rate for data collection. GDVN nozzles can pro- 
vide fast jets with velocities over 80 m/s [57], 
recently shown to successfully deliver sample 
jets compatible with the current operational 
pulse train structure of the FEL beam at Single 
Particles, Clusters, and Biomolecules (SPB)/SFX 
[4, 57]. 

High viscosity extrusion jet sample delivery 
[58] works in a similar way to GDVNs but the 
viscosity limits the restriction and acceleration 
which can be achieved. This results in lower 
speed and lower possible frame rates that can be 
achieved compared to their liquid counterparts. 
However, the lower speed increases the probabil- 
ity of hitting the crystals and generally results in 
higher hit rates. As membrane proteins can be 
directly crystallized in lipidic cubic phase and 
injected directly these nozzles enable measure- 
ment of non-soluble and also very important 
proteins which could not be investigated 
otherwise. 


10.7.6 Mixing Jets 


Mixing experiments provide important insights 
into the structure-function relationship of proteins 
in operando, by taking sequential snapshots at 
different time points after mixing [5, 27]. Up to 
date XFEL mixing experiments on crystal 
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suspensions, allow access to the time scale of 
seconds [59, 60]. However, the macromolecular 
conformational rearrangements that underlie sig- 
naling, transport, catalysis, and assembly happen 
on the p1s-ms time scales. There is a great interest, 
therefore, to develop rapid mixing injectors to 
access such processes. Mixing devices that incor- 
porate the GDVN geometry to reduce sample con- 
sumption have been developed and consist of three 
capillaries arranged in a coaxial configuration 
[61, 62]. These devices allow two reagents to 
co-flow and mix in the two inner capillaries before 
being focused by the gas flowing through the outer 
capillary. It is possible to probe short reaction times 
using small crystals (minimizing diffusion times). 
For example, a crystal with dimensions of 
0.5 x 0.5 x 0.5 um’ results in a modelled diffusion 
time of 17 us, while a 3 x 4 x 5 pm? crystal is 
estimated to take 1 ms, and a large 
300 x 400 x 500 um? crystal would take 9.5 s 
[5]. By adapting the distance and flow rates 
between the mixing point and the probed position 
a range of timescales can be probed. The 
GDVN-principle and the jet-in-jet geometry 
can be integrated as well into microfluidic 
devices fabricated by soft-lithography [63]. The 
advantage of such devices, made of PDMS 
(polydimethylsiloxane), lies in their microstructure 
reproducibility. The concept has been extended 
also to other materials, like glass and silicon, 
which show a stronger resistance to aggressive 
chemicals and to high pressure. Another approach 
is combining 3D printed nozzles [64] and 3D 
printed mixing regions with capillaries and/or 
microfluidics. 


10.7.7 Aerosol 


The small scattering cross section and 
non-crystalline nature of single particles require 
that the scattering background is reduced to a 
minimum. In aerosol-based sample delivery, the 
aim is to isolate the sample from any surrounding 
liquid, removing scattering from the delivery 
medium that would otherwise overwhelm the 
weak sample scattering signal. 

An aerosol of (sub-)micron-sized droplets, 
each on average containing one sample particle, 
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is generated by GVDN or for smaller particles 
and therefore droplets, by electrospray [65]. The 
droplets evaporate, leaving behind isolated sam- 
ple particles that are funneled into an aerody- 
namic lens [66, 67]. Within the aerodynamic 
lens, the particle flow is focused into a narrow 
beam that intersects the X-ray beam at the exit of 
the lens stack enabling efficient delivery of 
particles ranging from 30 to 3000 nm in diameter. 

The nozzle-to-interaction particle transmission 
varies with both particle size and gas flow, and 
can reach 70% for particles of a few hundred 
nanometers in diameter. The exit velocity also 
depends on particle diameter and gas flow, with 
sub-100 nm particles reaching 200 m/s, while 
micron sized particles travel slower at approxi- 
mately 20 m/s [66]. 

Potential complications of limited liquid jet 
speeds in the high repetition rate FEL beam, 
such as sample replacement and jet disruption, 
can be alleviated with aerosol sample delivery, 
where no surrounding liquid is present. The lack 
of surrounding liquid also enables the use of ion 
time-of-flight spectroscopy as a potential means 
to provide confirmation that an X-ray pulse 
intersected with a sample [68]. These features 
make aerosol sample delivery an intriguing possi- 
bility for delivering crystalline samples into the 
X-ray beam. 


10.7.8 Droplet Injection 


Performing experiments at atmospheric pressure 
(usually in a He atmosphere to reduce back- 
ground scattering) opens up the possibility to 
use other delivery-methods, like drop-on-demand 
or acoustic droplet ejection. The droplets can be 
probed after deposition on a moving belt [69] 
with the addition of optical laser activation with 
multiple pulses for complex reactions or passed 
through an oxygen rich environment. Drops can 
also be merged with the second drop initiating the 
reaction. Alternatively, droplets can be probed in 
flight [69, 70] and the reactions can be triggered 
by mixing two ballistic droplets [71]. Droplet 
injection, although it provides access to probe 
more complicated reactions, is currently limited 
by the repetition rate of the ejection apparatus and 
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in the case of drops on a belt the speed of the belt. 
As such these experiments are currently limited in 
the kHz range and are the limiting factor at MHz 
repetition rate facilities. 


10.7.9 Fixed Targets 


Injection methods are inherently inefficient with 
the sample as most of the sample volume, which 
must flow constantly, is never exposed due to the 
pulsed nature of the XFEL source. In contrast, 
mounting samples on a fixed target is highly 
sample efficient as using support grids to help 
align the crystals nearly all crystals or simply 
scanning through a loop all crystals can be 
exposed to X-rays and data collected. However, 
moving the target to a fresh spot takes time and to 
date only 10-120 Hz repetition rates have been 
achievable at FEL Using highly 
optimized systems such as the roadrunner [72] 
data collection rates in the kHz rate have been 
reported [73]. Combining fixed target data acqui- 
sition with a humidity and temperature-controlled 
environment also allows additional control of 
the local sample environment which provides 
an advantage compared to “in vacuum” data 
collection. 


sources. 


10.8 Serial Femtosecond 
Crystallography (SFX) 


A simplistic overview of SFX data processing 
does not necessarily need to deviate significantly 
from traditional X-ray diffraction solutions. Once 
a useful data collection strategy is identified, peak 
finding (identification of Bragg reflections), back- 
ground subtraction (using the integration of the 
area around the Bragg peak), indexing, and merg- 
ing is performed before a structural model can be 
proposed [74]. Just like traditional methods, most 
steps can be iteratively improved as more infor- 
mation is revealed. However, the nature of SFX 
experiments imposes novel hurdles at each of 
these steps. This short section aims to give the 
reader an insight into the problems and solutions 
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available with a goal of providing novice 
crystallographers with an awareness of some of 
the pitfalls in SFX data analysis. For further 
details we encourage the reader to follow the 
references for any method mentioned above. 


10.8.1 Comparison with Single Crystal 


Crystallography 


Goniometer-based MX experiments have the 
benefit of data collection efficiency, as each 
frame will contain data and each subsequent 
frame will be related to the former images by 
some rotation angle, making data analysis rela- 
tively simple. Since SFX experiments do not typ- 
ically use single crystals and often don’t control 
crystal orientation; large quantities of data — and 
large quantities of crystals — are required to ensure 
a dataset is complete. In some cases, this can be in 
the order of millions of images and therefore 
millions of crystals [75]. Since SFX data collec- 
tion is stochastic in nature, the final data can 
contain images of “empty” frames, single crystal 
reflections, and multi-crystal reflections. For this 
chapter, we will focus on datasets containing 
“empty” and single crystal images. Images 
containing multi-crystal reflections will not be 
considered here in any detail. Separating the 
multi datasets is relatively trivial in low numbers 
and not specific to SFX datasets. 

The most significant difference between MX 
and SFX data sets stems from the “femtosecond” 
nature of the data collection (Fig. 10.4). It refers 
to the X-ray pulse length and therefore the length 
of time each crystal is exposed [12, 29, 75]. As 
each crystal is only exposed once, the effective 
rotation of the crystal becomes zero degrees. To 
the uninitiated this may seem trivial, however, it 
is in fact one of the biggest considerations in SFX 
data analysis. The common aim of goniometer- 
based crystallography experiments is to measure 
the full intensity of each possible reflection and 
relate it to the number of the electrons in a given 
reflection plane [74]. While SFX experiments 
have the same final goal, limitations in the mea- 
surement of full reflections by only sampling 
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Beam dump 


Fig. 10.4 Experimental SFX setup. A schematic dia- 
gram of the experimental setup in the vicinity of the 
sample interaction region on the SPB/SFX beamline. In 
contrast to the conventional rotation method implemented 
in X-ray crystallography setups, in the SFX setup the 
goniometer head does not rotate and the sample is 


limited rotations means that a different analytical 
approach is needed. The following is an overview 
of the SFX data reduction and analysis process. 


10.8.2 Calibration of Raw Data 


Area detectors typically used at FEL facilities 
have central holes to allow the intense, direct 
X-ray pulses to pass through the detector. Each 
sensor module is mobile since the detector geom- 
etry should be optimized for each experiment. 
The position of each pixel in a frame is therefore 
required to be determined for meaningful data 
interpretation. For SFX data, well-defined peaks 
allow for a crosscheck as any errors in the deter- 
mination of the present detector geometry can be 
identified and revised during data analysis. How- 
ever, a verified initial calibration reference is 
valuable to aid optimization, common methods 
for this are the use of powder diffraction or 
small-angle scattering signals of a known calibra- 
tion sample is measured and analyzed before the 
main experiment. Lithium titanate or silver 


Interaction point 
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Optical side-mic camera 


X-ray laser 


Spent sample catcher 


delivered as a continuous jet stream. The optical side 
microscope camera (Oxford Instruments Andor Zyla 
5.5sCMOS) records images at a rate of 10 Hz to match 
the frequency of pulse trains arriving from the XFEL 
source. (Figure reproduced with permission from Ref. 


[76]) 


behenate are common choices as calibration 
samples, with the preference depending on the 
sample-to-detector distance of the experiment. 
Since XFEL pulse durations are extremely short, 
~10 fs, pulse resolved photon counting detectors 
are not useful for XFEL pulses and only charge 
integrating detectors, e.g., the Adaptive Gain 
Integrating Pixel Detector (AGIPD) [77] at the 
European XFEL, are usable (Fig. 10.2). The 
measured diffraction intensities by these charge 
integrating detectors need post correction pro- 
cesses, e.g., baseline correction (detector offset), 
background subtraction, and analog-to-digital 
unit (ADU) conversion, which result in photon 
counts for each pixel in an image [78]. 


10.8.3 Hit Finding — Identification 
of Frames with Usable 
Diffraction Data 


Detection of frames containing diffraction peaks 
or “hits” is arguably the most important part of the 
data analysis pipeline. Peak detection is the 
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foundation of the crystallographic data, and as 
with MX experiments it is a priority for rapid 
feedback during experiments [35, 79, 
80]. Subsequent steps such as indexing are deter- 
mined from the accurate geometry of the diffrac- 
tion pattern [81]. 

The large quantity of FEL-SFX data inhibits 
manual inspection of each image. Typical 
FEL-SFX experiments can reach many terabytes 
of data or more, and an automated approach is 
needed to initially identify useful frames. With 
poor peak detection the best-case scenario is a 
loss of some weaker frames as well as incorrect 
background subtraction; in the worst-case sce- 
nario, a quality dataset is deemed unusable and 
discarded. 

One approach for automated image evaluation 
is radial integration, where a threshold is deter- 
mined from the radial average and standard devia- 
tion of the diffracted intensity. Pixels found above 
the threshold are excluded and the process iterated 
several times. Once a reasonable threshold has 
been determined the outlying pixels above this 
threshold are used for peak analysis. The outliers 
(or potential reflections) are checked for a satisfac- 
tory number of adjacent outlier pixels. If the outlier 
is sufficiently connected, then this region of the 
detector has likely measured a reflection and will 
be used for further analysis. Background subtrac- 
tion can be performed by a “3 concentric ring” 
method where the inner ring contains the expected 
peak, the middle ring contains an ignored spacer 
region, while the outer ring is then used for the 
background measurements. 

Once a catalogue of data-containing images 
and relative peak locations has been determined, 
the next step is often indexing, the process of 
relating the identified reflections to a particle 
space-group. 


10.8.4 Indexing 

Indexing software commonly used in MX 
experiments are XDS [82] and MOSFLM [83], 
which are also utilized in some SFX data analysis 
software [79, 80, 84]. Each data-containing image 
can be indexed and will have a set of dimensions 
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related to the unit cell. Plotting a histogram of 
each dimension determined from each image 
should result in a distribution around the most 
likely parameter. Any heterogenous populations 
of crystals will become evident if multi-modal 
distributions appear in the histogram. This type 
of simultaneous collection of “multi-data” is not 
typically performed with MX style experiments. 
However, an SFX experiment with a multi-crystal 
sample would require enough frames for each of 
the crystal types. This would effectively multiply 
the required data collection time by the number of 
“types” present and the proportions found in the 
sample to be able to collect the necessary statistics 
for structures to be solved for each of the “types” 
present. 

The process of indexing each data-containing 
image is subtly different for SFX than MX 
datasets. It is possible for some crystallographic 
data to be indexed as its twin if the symmetry of 
the Bravais lattice is higher than the space-group 
symmetry. In the case of MX data, this is avoided, 
as each frame relates to each other geometrically, 
allowing the user to test all possible indexes and 
find the best fit. However, SFX does not have this 
luxury as each image is independent so if one was 
to continue through the subsequent steps of merg- 
ing, one would find a perfectly twinned dataset 
even if the crystal sample is not twinned. This is 
known as the “indexing ambiguity” [85]. 

The indexing ambiguity has had some 
solutions [85, 86] that have remained robust 
under most circumstances, though it must be 
noted some modern indexing methods using min- 
imal data still suffer from indexing ambiguities 
[87]. The final choice of indexing algorithm 
should be robust against the indexing ambiguity 
problem otherwise only be used if prior knowl- 
edge of the space-group is known. 

Once indexing is successful, the detector 
geometry should be checked for self-consistency. 
A common feature of FEL detectors is the modu- 
lar construction and often movable independent 
active parts. It is therefore possible the detector 
module positions have not remained at the given 
positions defined in the analysis software (geom- 
etry file) and optimization would be beneficial. 
Improving detector geometry may improve 
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subsequent iterations of indexing, particularly 
those with few or low-intensity peaks. Once 
these iterations are complete, it is possible to 
merge the indexed frames. 


10.8.5 Merging 


CrystFEL allows for both Monte-Carlo methods 
and more modern scaling and post-refinement 
methods. Once the lattice/lattices of the sample 
have been found, the task of assigning intensities 
to each reflection begins. Almost all reflections 
measured in SFX experiments are partial 
reflections, so an estimate of the full intensity is 
required. The increase in the number of partial 
reflections is due to the femtosecond exposure 
and consequently the zero-oscillation resulting 
in few-to-no reciprocal lattice points crossing 
the Ewald sphere. 

In traditional MX diffraction experiments, the 
oscillation range can be optimized to maximize 
the number of full reflections in each image with- 
out causing overlap. In SFX, the combinations of 
these hurdles require novel approaches to achieve 
similar data pipelines as with the traditional 
approach. 

SFX experiments often address projects with 
small crystals, which often do not crystallize or 
diffract well. A small crystal (in the order of 
<1 um) can significantly change the diffraction 
patterns in a qualitative way, as smaller crystals 
provide relatively larger contributions from their 
shape transform. In fact, the number of features 
(“fringes”) aligned and spaced between neighbor- 
ing Bragg peaks is proportional to the number of 
unit cells in the illuminated crystal. As the num- 
ber of unit cells becomes greater, so does the 
number of features, though the intensities of 
these features become negligible, and we return 
to the regime of traditional Bragg diffraction. As 
the crystal becomes smaller and smaller, the 
shape transform becomes more prominent until 
the regime of single unit cells where data collec- 
tion moves into the realm of single particle imag- 
ing (see SPI). Peak finding algorithms may need 
to correct for these features before accurate Bragg 
peak detection is possible. 
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The features of the shape transform found in 
small crystal diffraction are not necessarily prob- 
lematic and in fact can provide information about 
the size and projected shape of the crystal by 
using phase retrieval techniques [20]. Isolated 
integration of single peaks will likely underesti- 
mate the structure factor due to the shape trans- 
form. Kirian et al. (2010) demonstrates how the 
Monte-Carlo integration of the features near and 
around Bragg peaks can result in the square of the 
structure factor magnitude [88]. To simplify the 
simulations, Kirian et al. (2010) used a measured 
lattice and symmetry from a PS1 dataset but 
reduced the large number of atoms (only chang- 
ing the Bragg intensities) and then used appropri- 
ate scaling so that the modelled structure factors 
agreed with the measured structure factors for the 
photosystem | (PS1). The Monte-Carlo approach 
is an excellent way to remove or minimize the 
variations in X-ray intensity and crystal size from 
shot-to-shot. However, the Monte Carlo approach 
alone, without further intervention, may not allow 
for correctly merged datasets as there may exist 
significant indexing problems inherent in SFX 
data, which extremely large amounts of data 
would be needed to overcome. 

Issues arise when the symmetry of the Bravais 
lattice is greater than the space group, where there 
exist two or four equally likely indexing options. 
Direct merging of these data can result in the 
appearance of perfectly twinned data, despite no 
actual crystal twinning being present. This can be 
traditionally resolved with the measurement of 
full reflections but as mentioned earlier SFX 
data consist primarily of partial reflections 
[86]. Of the 65 space groups there are 27 space 
groups with this indexing ambiguity. Liu et al. 
(2014) was able to overcome this indexing ambi- 
guity by using an expectation maximization algo- 
rithm. They briefly describe the algorithm’s 
operation by comparing the measured diffraction 
pattern to a three-dimensional model of the full 
reflection intensities on the reciprocal lattice. Cor- 
relation coefficients are then computed between 
the pattern and all possible indexing possibilities. 
The indexing model that provides the highest 
correlation is then used to merge the images into 
the model. This model is then used in the next 
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iteration and the algorithm continues until 
convergence [86]. 

Algorithms published by Brehm and 
Diederichs (2014) have also solved the indexing 
ambiguity [85]. The authors focused on a dataset 
from PS1, incorrectly indexed as P6322, and were 
able to correctly index it as P63. Their approach 
was to consider each shot as a vector (x) in 
k-dimensional space and by minimizing the dif- 
ference between (1 — rj) and (x, — xj). Doing so 
results in the identification of clusters of “clouds” 
when inspected graphically. These clouds repre- 
sent the indexing modes and can be used to merge 
the data correctly. 


10.8.6 Overview of Analysis 
Procedures 


As SFX experiments are in their infancy by com- 
parison to traditional protein crystal diffraction 
experiments, so too are the programs that handle 
SFX data. This is not to say that the methods used 
are primitive, far from it, rather they are not as 
universally used as compared to such traditional 
programs for example: XDS [82], Aimless [89], 
and PHENIX [90]. It should be noted that this 
chapter does not aim to cover all the available 
programs and those mentioned herein are not 
necessarily the definitive standard, rather a way 
to describe the common processes involved in 
data processing. Furthermore, there are several 
research groups who use in-house algorithms 
optimized for their needs and the reader is 
encouraged to branch out to whatever algorithms 
they feel most useful and applicable to their data. 

CrystFEL [84], for example, is a useful pro- 
gram suite which covers a vast number of tools 
for SFX data, including peak finding, beam 
cantering, indexing, integrating, scaling, merging, 
and even simple simulations. 


10.9 Single Particle Imaging (SPI) 
10.9.1 


Comparison with SFX 


SFX and SPI experiments are strongly related. 
Both aim to determine 3D structural information 
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from a set of 2D images each with random orien- 
tation. SFX and SPI data analysis challenges are 
therefore also related, at least conceptually. The 
key difference is that SPI explores structures 
without the need of crystals, which enables the 
possibility to observe a wider range of systems 
than is currently possible for crystallography. 
However, the lack of the strong diffraction signal 
observed from crystals presents new challenges. 


10.9.2 Overview of Analysis 
Procedures 


The data analysis workflow for FEL-based SPI is 
similar to SFX and consists also of three major 
steps: (1) identification of usable frames (“hits”) 
from a very large data set, in this case frames 
containing diffraction from a single particle, 
(2) orientation determination, and (3) 3D image 
reconstruction by phasing. All these steps are 
accompanied by validation procedures, and they 
will be discussed step by step in the following 
paragraphs. 


10.9.3 Calibration of Raw Data 


The weak signals observed in SPI experiments 
increase the importance of calibration data to 
obtain and confirm the geometry of each sensor 
module prior to the start of data collection. Data 
pre-processing for SPI is therefore a crucial and 
sensitive first step for a successful image recon- 
struction. Use of optimized geometry and calibra- 
tion information from other techniques (such as 
SFX) can be used to aid optimization. However, 
accurate conversion to photon counts is especially 
important for SPI as the average signal level per 
pixel is often well below a single photon [36, 91]. 


10.9.4 Hit Finding — Identification 
of Single Particles Only 


For a successful 3D image reconstruction in an 
SPI experiment, a massive data set measured 
from many identical particles is required as the 
individual 2D diffraction patterns are randomly 
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oriented and the signal from a single particle is 
relatively weak. The goal of this step is to identify 
single particle hits out of most non-hits and 
multiple-hits and (ideally) to also classify the 
particle-size distribution. Selection is usually 
based on a per-frame analysis of the calibrated 
2D detector data. A basic version of such a pro- 
cedure sums up the total number of photons in a 
region of interest on the detector and identifies a 
frame as a hit if the sum is above a pre-selected 
threshold [37, 51] — the so-called “lit pixel” 
method. Any thus selected “initial-hit” then 
must be classified more specifically, i.e., single 
particle hits, multiple particle hits, and hits of 
particles which are not sample (e.g., aggregates, 
solvent droplets). Currently available classifica- 
tion algorithms are mostly based on a spectral 
clustering method know as a kernel-based Princi- 
pal Component Analysis (PCA), which utilizes 
the nonlinear correlations over a wide range of 
length scales [37, 51, 92, 93]. Once the single 
particle hits are identified, the size distribution 
of particles can be clarified, although the particles 
are typically already pre-selected in size to some 
extent, e.g., by using a centrifugal separator. 
Beyond analyzing the recorded 2D detector data, 
hit identification may be aided by independent 
information such as time-of-flight spectroscopy 
of debris particles generated by a hit [68]. 


10.9.5 Orientation Determination 


Well-defined Bragg peaks exhibiting symmetry 
as observed in SFX experiments facilitate the 
determination of the proper orientation of each 
image. In contrast, the more diffuse and lower 
intensity diffraction patterns observed in SPI 
present a greater though not insurmountable chal- 
lenge. The most direct way to classify the same 
orientations is via cross correlation analysis, 
although the costs of direct cross correlation anal- 
ysis on entire 2D diffraction intensity patterns in 
computing power is significant. Furthermore, the 
weak diffraction intensities typically applicable to 
bio-SPI with photon counting noise make these 
calculations inaccurate. The so-called EMC 
(expand-maximize-compress) algorithm, which 
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is based on the expectation maximization 
(EM) algorithm [94], offers improved outcomes 
compared to traditional common-line or correla- 
tion methods and has become a key orientation 
determination tool for SPI experiments [91]. 

The EMC algorithm iteratively estimates a 3D 
diffraction intensity pattern by maximum likeli- 
hood estimation between the estimated and 
measured diffraction intensity patterns. As it is 
recognizable from its name, the algorithm 
consists of three steps: expansion (E), expectation 
maximization (M), and compression (C) steps. 
During the expansion step, at first a 3D model 
diffraction intensity map and an updated 3D dif- 
fraction intensity map afterward is expanded to 
2D projected diffraction intensity patterns, like 
tomography data sets, at all possible orientations. 
Then, the estimated 2D projections are compared 
with the measured diffraction intensity patterns to 
update their orientation information during the 
expectation maximization step. All the intensity 
patterns with the same orientation are summed up 
and an updated 3D diffraction intensity pattern is 
reassembled from them in the compression step. 
After a few iterations, a final 3D diffraction inten- 
sity pattern is ready for the next step. 


10.9.6 Phasing 


The last step of the data analysis workflow for 
FEL-based SPI is the phase retrieval process, or 
phasing since the phase information in the recip- 
rocal space is lost during the measurement. The 
practical phase retrieval algorithms, i.e., the error 
reduction (ER) [95] and hybrid input output 
(HIO) [96], have been expanded over the past 
decade or so to include some novel variations, 
e.g., the guided HIO (GHIO) [97], shrinkwrap 
[98], and the ptychographic iterative engine 
(PIE) [99]. In SPI data analysis, both the ER and 
the HIO algorithms, each leveraging two major 
constraints known as modulus and support 
constraints, have been widely used. Since small 
biomolecules can be considered as weak phase 
objects, a positivity constraint, which only allows 
real positive numbers in the real space image, 
may also improve the phase retrieval processes. 
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The aforementioned algorithms and constraints 
are applied to the final 3D diffraction intensity 
pattern obtained by the EMC algorithm and a 
quantitative 3D image (an electron density) in 
real space can be achievable. 


10.10 Notes on the Need for and Use 
of Automation 


Both SFX and SPI, as discussed above, require 
many usable frames not only due to the stochastic 
nature of injection but also enough for Monte- 
Carlo integration to be useful. Current and fore- 
seeable injection techniques cannot predict or 
determine sample rotation prior to analysis [58]; 
hence the data collection process resides in a 
parameter space much larger than traditional 
predetermined rotation schemes. Traditional crys- 
tallography will often result in a structure from 
less than 720 frames allowing a crystallographer 
to check each frame manually if they so desire. In 
comparison SFX and SPI data often require 
>10,000 usable frames. As successful SFX and 
SPI experiments may achieve “hit rates” (the 
coincidental act of the diffracting object of inter- 
est intersecting the X-ray pulse) of ~10% or 
lower, a usable data set can easily exceed 
100,000 frames. 

Given these large data sets, it would be unrea- 
sonable for anyone to manually check each frame 
and therefore automation of data analysis is a 
necessity. Furthermore, as feedback from analysis 
of collected data to the ongoing data collection 
(in as close to real time as possible) is of great 
benefit for optimization and success of the exper- 
iment, automation of data analysis to provide this 
feedback in a structured way is a high priority. 
Each facility employs their own approach, which 
is related to their main scientific focus. These data 
analysis pipelines are constantly being improved 
with integration of more complex algorithms and 
possibilities for optimization for a given experi- 
ment - often with strong collaboration with the 
user community. It is therefore strongly 
recommended to discuss what data analysis and 
feedback is possible with the staff of the facility 
where you intend to collect data. These 
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discussions during the planning stages enable 
not only the possibility of a customized approach 
if needed but are highly valuable to 
demonstrating feasibility of a proposal. 
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Abstract 


Small angle X-ray scattering (SAXS) is a ver- 
satile technique that can provide unique 
insights in the solution structure of 
macromolecules and their complexes, cover- 
ing the size range from small peptides to com- 
plete viral assemblies. Technological and 
conceptual advances in the last two decades 
have tremendously improved the accessibility 
of the technique and transformed it into an 
indispensable tool for structural biology. In 
this chapter we introduce and discuss several 
approaches to collecting SAXS data on mac- 
romolecular complexes, including several 
approaches to online chromatography. We 
include practical advice on experimental 
design and point out common pitfalls of the 
technique. 
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11.1 Introduction 

Understanding the function and functional mech- 
anism of macromolecular complexes is tightly 
linked to knowledge of their structure. Ideally, 
we would like to completely characterize the 
structure in the natural environment of the com- 
plex, the living cell [1, 2]. Indeed, recent 
approaches in super resolution light microscopy 
enable direct studies of super-complexes, such as 
the nuclear pore complex [3, 4], meiotic chromo- 
some axes [5] or the pericentriolar material 
(PCM) [6]. However, the complex environment 
of the cell can also limit our degree of control, 
turning direct conclusions on causal relationships 
on the molecular level difficult to impossible. In 
addition, the resolution of fluorescence micros- 
copy is intrinsically limited by its need of fluores- 
cent labels [7]. Often, reducing the complexity of 
the system by isolating the object of interest can 
resolve these difficulties, thereby allowing func- 
tional studies on purified complexes in solution to 
be performed. 

Small-angle X-ray scattering (SAXS) can pro- 
vide structural information of macromolecules in 
solution on the nanometer scale. It is well suited 
to study the conformation and/or conformational 
ensembles of macromolecular complexes. Recent 
examples include studies of the Tat: AFF4:P- 
TEFb complex involved in proviral transcrip- 
tional activation of HIV-1 [8], the binding of 
vaccinia virus (VACV) DNA polymerase E9 to 
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DNA [9], the structure of host factor Q-beta 
phage (Hhq) of Escherichia coli in complex 
with mRNA [10], sub-complexes of the 
multisynthetase complex [11], ternary complexes 
of the C2 domain of coagulation factor VIII 
(FVII) and inhibitory antibodies [12], or man- 
nan-binding lectin (MBL) and MBL-associated 
serine protease-1 (MASP-1) complexes of the 
lectin pathway of the complement system [13]. 

Typical questions for a SAXS experiment 
would be “What is the solution structure of my 
complex?’, “How do complex formation and dis- 
sociation depend on the concentration of the 
partners?” and “Which factors can alter the com- 
plex or its formation?” There are many SAXS 
analysis tools available to address this kind of 
questions, for example FoXSDock [14], SASREF 
[15], DAMMIF [16], DENFERT [17], or OLIG- 
OMER. However, none of these can provide 
meaningful answers if the quality of the SAXS 
data is not good enough. It is therefore imperative 
to design SAXS experiments well and to collect 
high quality data to avoid misinterpretations 
based on experimental artefacts. 

In this chapter, we will discuss different 
approaches for collecting SAXS data for 
complexes, when it is appropriate to use them, 
and which pitfalls need to be avoided. Figure 11.1 
illustrates the typical SAXS workflow. 


11.2 Dilution Series SAXS 


The classical BioSAXS experiment consists of 
recording a series of dilutions of the sample of 
interest at known concentrations; see, e.g., [18]. A 
typical setup first measures buffer, sample at con- 
centration 1, buffer, sample at concentration 
2, buffer, sample at concentration 3, and a final 
buffer measurement. This setup has numerous 
advantages: 


e High quality data from low concentration 
samples. For large complexes as little as 
0.5 mg/mL is sufficient. 

e Experiments can be performed on any SAXS 
setup, including in-house instruments, as low 
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photon flux can be compensated by increased 
exposure time. 

e Low sample volumes of 30-50 uL are 
required. This is of special interest for additive 
screenings. 

¢ In general, macromolecular concentrations are 
well-defined and known, allowing more accu- 
rate mass estimates from forward scattering. 
This is of particular interest for flexible 
systems for which volume-based approaches 
(Porod volume, correlated volume, etc.) 
struggle. 

e When the macromolecular concentration is 
controlled, it is possible to investigate its effect 
on the scattering signal and draw conclusions 
on the mixing state. 

¢ The static nature of the sample means that 
complementary experiments (e.g., UV-Vis 
spectroscopy, dynamic light scattering) can 
be performed on the same sample or under 
identical conditions with relatively low techni- 
cal effort. 

e Itis often easier to achieve high macromolec- 
ular concentrations when not relying on online 
purification. This improves the signal-to-noise, 
in particular at higher diffraction angles where 
the signal is often weak. 

e On high-flux instruments, e.g., synchrotron 
beamlines, individual measurements are rela- 
tively fast (1-5 min), enabling easy access to 
slow kinetics [19]. 


For stable complexes, a dilution series experiment 
is therefore often the best way to obtain high 
quality SAXS data. For example, M. Karlsen 
and co-workers recently determined the 
low-resolution structure of the dimeric and tetra- 
meric complexes of the BAR domain protein 
PICK1 [20] by deconvoluting SAXS data of the 
equilibrium dimer-tetramer system. Cordeiro 
et al. went one step further, combing SAXS and 
molecular dynamics simulations to obtain the 
low-resolution structure of PCNA-p15 complexes 
and their dissociation constant at the same 
time [21]. 

Even if the composition of a sample is not 
concentration dependent, it is still necessary to 
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Fig. 11.1 Workflows of SAXS experiments. To obtain 
SAXS data of a complex from “static” samples, multiple 
samples at different absolute and relative concentrations 
are necessary (a). For each of these samples individual 
SAXS curves are collected (c, d). The information from 
these curves can be combined to an idealized SAXS curve 


of the complex suitable for SAXS-based modelling (f). For 
obtaining SAXS data of a complex via SEC-SAXS, a 
sample is applied on the column and data are directly 
collected on the eluent (b, c). The resulting set of continu- 
ous scattering curves (e) can be deconvoluted to obtain the 
SAXS curve of the complex (f) 
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measure several concentrations as additional phe- 
nomena, such as inter-particle scattering or aggre- 
gation can still affect the signal in a concentration 
dependent manner. 

The standard strategy that is advised for 
investigating a stable complex or homo-oligomer 
requires the following measurements: 


e SAXS concentration series of the individual 
components. 
e SAXS concentration series of the complex. 


For non-stable complexes, it is advisable to also 
vary the mixing ratios: 


e SAXS concentration series of the individual 
components. 

¢ Several SAXS concentration series at different 
mixing ratios of the components. 


Note that for this kind of experiments, precise 
knowledge of the number and respective size of 
the constituents of your system is essential. This 
requirement is directly linked to the 
disadvantages to this approach, which include: 


e Good background subtraction is essential, i.e., 
the buffer used for correction must be identical 
to the one of the samples at all concentrations. 
This is ideally achieved by dialysis, although 
carefully designed and performed spiking, 
diafiltration or desalting column runs can 
serve as alternatives. 

e Any “contaminants” such as aggregates, 
higher order oligomers or degradation 
products that are not explicitly taken into con- 
sideration during analysis and interpretation 
will invalidate most conclusions drawn from 
the experiment. These often form during 
freeze thawing of the protein. If possible, 
checking your sample with dynamic light scat- 
tering prior to measurement will alert you to 
any larger aggregates. 


11.3 Inline Purification SAXS — 


Fighting Polydispersity 


As mentioned above, obtaining the scattering 
curve of the monodisperse complex is essential 
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for any further structural analysis. While such a 
curve can sometimes be obtained by 
deconvolution from SAXS data of mixtures 
even for non-stable complexes, we would ideally 
measure it directly to reduce the number of possi- 
ble systematic errors. As association and dissoci- 
ation of complexes and oligomers is generally not 
instantaneous, purification directly before data 
collection can provide scattering curves of mono- 
disperse systems. 

Chromatography techniques are particularly 
well suited to be coupled to SAXS data acquisi- 
tion. As the sample passes through the porous 
matrix of the column, it separates by size, affinity 
or charge. The most common chromatography 
method used in combination with SAXS is size 
exclusion (SEC). Continuous data collection of 
the chromatography-coupled-SAXS experiment 
has several advantages over the more conven- 
tional concentration series SAXS experiments: 


e Separation of large (e.g., aggregates) or small 
(e.g., excess buffer components) size 
contaminants. 

¢ Elution of the sample in the running buffer 
facilitates correct background matching, if the 
column is equilibrated correctly. 

e Peak shifts can directly reveal binding. In addi- 
tion, calibrated SEC allows simultaneous 
determination of the Stokes radius based on 
the peak position is possible [22]. 

e Large structural variability (e.g., domain 
movements) within a macromolecule/complex 
can be revealed. 


In the decade following the introduction of SEC- 
SAXS [23], it has become a standard approach 
and is available in many small-angle scattering 
facilities, including small-angle neutron scatter- 
ing (SANS) instruments [24-30]. 

There are however disadvantages to SEC- 
SAXS that also need to be considered: 


¢ Higher protein concentrations are required due 
to column dilution effects, ideally upwards of 
5 mg/mL. 

e Experiments should be performed at 
synchrotrons or other high photon flux 
instruments. 
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In this experimental design, it is important to 
consider separation properties. Often, the appar- 
ent size of a complex is close to that of one of its 
constituents, even though their masses might dif- 
fer significantly. They are therefore difficult to 
separate, and it is essential to eliminate excess of 
this constituent, e.g., by adding its partner in 
excess. 

Choosing the right size exclusion column for 
your experiment is essential for the success of a 
SEC-SAXS experiment. Ideally, the elution peak 
should be well centered between the void volume 
and the total column volume to avoid overlaps 
with aggregates and/or excess small molecules. 
This implies that more than one column type 
might be necessary to study a complex and all its 
constituents. Additionally, some macromolecules 
might be able to interact with the column medium, 
resulting in abnormal elution or even conforma- 
tional changes. In general, concentration, flow 
rate, and volume dictate the quality of separation. 

Sometimes, even online purification cannot 
completely separate different species, either 
because they are too close to each other in size 
or because of fast dynamics. Just as for dilution- 
series measurements, it is possible to deconvolute 
SAXS data to obtain idealized SAXS curve of 
individual components. 

The program COSMICS uses the fact that the 
information content of SEC-SAXS data can be 
increased by emphasizing changes in different 
parts of the curve (e.g., Kratky representation) 
and applies the multivariate curve resolution 
alternating least squares (MCR-ALS) chemometric 
method to obtain “pure” curves [31, 32]. 

The SEC-SAXS package of US-SOMO 
decomposes the chromatogram into individual 
peaks of pre-defined shape [33]. In addition, it 
can also reduce the influence of capillary spoiling 
due to material deposition on the final data. 

The finite size of individual peaks allows 
evolving factor analysis (EFA) to deconvolute 
overlapping peaks [27, 34]. A SAXS specific 
implementation can be found in BioXTAS 
RAW [35]. 

One important requirement for all these 
approaches is that the SAXS signal of each indi- 
vidual contributor must be constant. Large scale 
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flexibility (e.g., movement of complex partners 
relative to each other) therefore renders these 
approaches unfeasible. 

Of course, online purification is not limited to 
SEC. For very large multi-subunit macromolecu- 
lar complexes or complexes that dissociate under 
shear stress, differential ultracentrifugation can be 
better suited [36]. 

For complexes that cannot be separated by 
size, or which are prone to aggregation during 
the concentration steps, separation by physical 
and chemical properties might be more appropri- 
ate. For example, ion exchange chromatography 
separates based on surface charge. As the bulk 
protein concentration is low prior to elution, high 
concentrations can be achieved without losses to 
aggregation [37]. 

Of particular interest for complexes is online 
affinity chromatography, especially if one of the 
constituents is insoluble on its own [38]. 

Since all these approaches rely on buffer vari- 
ation for the separation, correct background sub- 
traction is non-trivial, but manageable. All these 
approaches are less routinely available at SAXS 
instruments and require communication with 
local responsible about necessary equipment. 


11.4 Buffer Considerations 


Although SAXS measurements can be performed 
in almost any buffer, a few considerations should 
be kept in mind when selecting the buffer for a 
given experiment: 


e Contrast: the strength of the SAXS signal 
depends on the difference in electron density 
between the buffer and the macromolecules. 
Try to keep the concentrations of all buffer 
components as low as possible. 

¢ The salt concentration in the buffer should be 
ideally between 20 and 300 mM. High salt 
concentrations reduce the contrast and conse- 
quently increase the noise level. Low salt 


concentrations increase undesired inter- 
particle scattering. 
e Preferentially use potassium salts in 


experiments with RNAs. 
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e Viscosity and surface tension: keep viscosity 
low because at many facilities the sample 
needs to be handled by a pipetting robot. 

e Glycerol content should be kept as low as 
possible (max. 20% (v/v)). Note that 
glycerol-containing buffers slow down col- 
umn equilibration. This is relevant for both 
SEC-SAXS experiments and for buffer 
matching prior to experiments. Up to 10 col- 
umn volumes may be required to equilibrate a 
column sufficiently for SEC-SAXS. 

e Only use reducing agents if absolutely 
required. Which agent is appropriate depends 
on the sample and needs to be tested. 

e Do not use protease inhibitors in the final 
sample. 


As the buffer composition might affect the struc- 
ture and/or stability of a complex, additional con- 
trol experiments might be necessary if 
complementary techniques require different buffer 
conditions (e.g., no salt). These measurements 
might result in suboptimal SAXS data but might 
still help to confirm or exclude structural changes. 


11.5 Analysis of Macromolecular 
Complexes 


Assuming the data collection of a macromolecu- 
lar complex was successful the data sets need to 
be analyzed. Primary processing creates idealized 
scattering curves and determines SAXS 
invariants [39]. A variety of software tools help 
with all these steps for both dilution series and 
SEC-SAXS such as ATSAS [40], ScAtter or 
BioXTAS RAW [35]. 

Modeling and model validation are generally 
more specific for the problem at hand. Before 
attempting any SAXS-based modeling, one 
should know the answers to the following 
questions: 


* Does the SAXS curve represent a (structurally) 
monodisperse system? Or are different oligo- 
meric states present? 
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e Is there only one kind of component (e.g., only 
protein)? Or are there different kinds of 
components (e.g., protein/DNA complexes)? 

e Are there flexible regions? 

e Does complex formation affect the shape of 
the constituents? 


Flexible systems generally require atomistic 
ensemble approaches to modeling, e.g., EOM 
[41], MES [42] or SASSIE [43]. Protein/DNA 
complexes do not require special precautions 
when building atomistic models, but for building 
ab initio models the difference in scattering con- 
trast between protein and nucleic acid needs to be 
considered, e.g., MONSA [44]. 

For both atomistic and ab initio modeling it is 
possible to include oligomer formation. 


11.6 Presenting “Complex” Data 
for Publication 


When presenting SAXS data of complexes for 
publication it is recommended to follow the 
IUCr’s publication guidelines for small-angle 
scattering data from biomolecules in solution 
[45]. Especially for non-stable complexes, it is 
important to precisely explain how sample 
monodispersity was ensured. 

Additionally, one should not only present the 
complex data itself, but — if attainable — also 
SAXS data of the constituents. If different models 
for a complex (e.g., different interaction 
interfaces) can be constructed, comparisons 
between rejected models and the recorded data 
should also be presented. 


11.7 Case Study 


Vaccinia is the prototype member of the 
Poxviridae. This family contains important 
pathogens, such as the smallpox-causing virus 
Variola, with large double-stranded DNA 
genomes that replicate exclusively in the 


11 Small-Angle X-Ray Scattering for Macromolecular Complexes 


cytoplasm of the host cell, where the replication is 
solely dependent on virus-encoded proteins 
[46]. E9 represents the catalytic subunit of the 
viral DNA polymerase. Over the years, a number 
of genetic and biochemical studies have 
characterized this protein [47], but only very 
recently a high-resolution structure of full-length 
E9 was solved at 2.7 A resolution [9]. The protein 
displays a classical palm, thumb, finger, exonu- 
clease, and N-terminal domains of a family B 
polymerase in an open conformation. The high- 
resolution structure permitted the identification of 
important poxvirus-specific structural insertions 
[9]. Furthermore, E9 is the target of several 
antivirals [48, 49], hence it is of high medical 
interest not only to understand the structure but 
also acquire knowledge of the structure of the 
DNA-bound state to understand resistance 
mutations against polymerase inhibitors. 

Related published family B polymerase 
structures in complex with DNA oligonucleotides 
indicate that considerable domain movements 
occur upon DNA binding leading to closed 
structures compared to the apo forms. To study 
the domain movements of E9, SEC-SAXS 
experiments on an isomorphous exo™™S mutant 
(without exonuclease activity) bound to a 29mer 
DNA hairpin were performed and compared to 
models of E9 in elongation mode based on the 
yeast pol 6 structure bound to DNA (PDB 
ID 3IAY). 

The E9 exo™""’/29-mer hairpin DNA com- 
plex mimicking template and primer strand was 
formed by mixing the protein with a 20% molar 
excess of the DNA. 50 uL of this mix were 
injected onto a SEC-column (Superdex 
200 Increase 5/150 GL column, GE Healthcare) 
in-line with the flow cell for SAXS measurements 
at the BioSAXS BM29 beamline at the European 
Synchrotron Research Facility (ESRF) in 
Grenoble, France. The column was equilibrated 
with 3 column volumes of 20 mM Tris-HCl, pH 
7.5, 100 mM NaCl prior to the experiment. 
1000 frames of 1 s exposure time per frame 
were collected at a flow rate of 0.3 mL min |, 
covering 1.7 column volumes. The individual 
frames were processed automatically and 
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independently within the EDNA framework 
[50], yielding radially averaged curves of 
normalized intensity vs. scattering angle 
q = 4nsin6/1. The frames that corresponded to 
the elution of the protein/DNA complex, the pro- 
tein alone and the DNA were individually merged 
and analyzed further using the tools of the 
ATSAS package [51]. Comparison to apo E9 
exo™™S showed that formation of the complex 
between E9 exo™™"S and the 29-mer DNA hairpin 
reduced the radius of gyration from 3.83 nm for 
the apo form to 3.45 nm, directly implying DNA 
binding in a cavity of E9. 

Different models of E9 exo™™S with bound 
DNA were built based on its high-resolution crys- 
tal structure and the structure of yeast pol 6 bound 
to DNA, independently of the SAXS 
experiments. Comparison of the predicted scatter- 
ing curves computed with CRYSOL [52] from 
each of these models to the SAXS data allowed to 
evaluate the models. It was not sufficient to adjust 
only the position of the thumb domain of E9, 
which is expected to show the strongest move- 
ment, but only when the individual domains of E9 
are adjusted, the theoretical curve of the E9 
exo™™S/DNA matched very well the observed 
scattering curve (lowest x°, Fig. 11.2). 

In this experiment SAXS provided essential 
insights into the conformational changes of E9 
upon DNA binding. Understanding these changes 
greatly enhances our possibilities for developing 
drugs targeting E9. 


11.8 Conclusions 


Small-angle X-ray scattering is a powerful and 
versatile tool for structural biology. Applied 
correctly, it can provide unique insights into 
structures and structural dynamics of 
bio-macromolecular complexes. 

The success of a SAXS experiment depends to 
a large extent on the sample preparation. Due to 
the increased popularity of the technique, there is 
a plethora of recent literature with advice on 
sample (and buffer) preparation for successful 
experiments. 
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Fig. 11.2 Comparison of the scattering data of E9 
exo™"™"5/29-mer hairpin DNA complex to different 
models. To obtain the best fit (orange curve) is necessary 
to adjust the position of all domains of E9. Adjusting only 
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Abstract 


High-resolution structure determination by 
electron cryo-microscopy underwent a step 
change in recent years. This now allows 
study of challenging samples which previ- 
ously were inaccessible for structure determi- 
nation, including membrane proteins. These 
developments shift the focus in the field to 
the next bottlenecks which are high-quality 
sample preparations. While the amounts of 
sample required for cryo-EM are relatively 
small, sample quality is the key challenge. 
Sample quality is influenced by the stability 
of complexes which depends on buffer com- 
position, inherent flexibility of the sample, and 
the method of solubilization from the mem- 
brane for membrane proteins. It further 
depends on the choice of sample support, 
grid pre-treatment and cryo-grid freezing 
protocol. Here, we discuss various widely 
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applicable approaches to improve sample 
quality for structural analysis by cryo-EM. 


Keywords 


Electron cryo-microscopy - Sample 
preparation - Sample stability - Sample 
optimization - Grid preparation - Membrane 
proteins 


12.1 Introduction 

Electron cryo-microscopy (cryo-EM) emerged 
in 2013 as a technology capable of providing 
structural insights into biological macromolecules 
at near-atomic resolution. Technical advances in 
cryo-EM and in image processing made this rev- 
olution in resolution possible [1—3] (see Chap. 13 
by J.L. Carrascosa). The breakthrough is mainly 
based on the implementation of direct electron 
detectors that record movies, rather than images, 
with unprecedented speed and quality. Increased 
computation power and new image processing 
software allow correction for sample movements 
during imaging and help to deal more efficiently 
with sample heterogeneity. Using this new 
technology even smaller specimens can be struc- 
turally characterized by cryo-EM. The highest 
resolution reported to date (2018) is the 1.8 A 
cryo-EM structure of 334 kDa_ glutamate 
dehydrogenase [4]. 
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In the last 5 years, cryo-EM structures depos- 
ited in the EM data base (https://www.ebi.ac.uk/ 
pdbe/emdb/) reached an average resolution of 
11.2 Å (Fig. 12.la). At this resolution it is not 
possible to recognize amino acid side chains and 
this precludes building of atomic models without 
prior structural knowledge. However, it is very 
interesting to notice that the average resolution 
improved more than threefold between 2014 
(22.3 A) and 2018 (6.9 A), highlighting the 
potential of cryo-EM for many different samples 
(Fig. 12.1b) and the increasing access to state-of- 
the-art cryo-microscopes and detectors. 

The reason for limited resolution of many 
cryo-EM structures is inherent to the technique: 
In single particle cryo-EM, the purified, imaged 
macromolecular particles are assumed to be iden- 
tical. Accordingly, it is possible to computation- 
ally average over many identical, but noisy 
images of particles — their identity being deter- 
mined in a sorting process called classification 
[5-7]. Averaging improves the signal-to-noise 
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Fig. 12.1 Analysis of EM structures. (a) Numbers of 
EM structures in the EM data bank (EMDB) by resolution 
range. Only EM structures released since 2014 were taken 
into account. De novo atomic model building is usually 
possible only for structures with a resolution of better than 
4 A. (b) Classification of 584 high-resolution cryo-EM 
structures according to biological function or specific 
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ratio and thus the resolution in the resulting 2D 
class averages. High-resolution 2D class averages 
showing secondary structure features can most 
readily be obtained from homogeneous subsets 
of particles, which are present in high-quality 
samples. 2D classification also provides a means 
to identify sample heterogeneity and to computa- 
tionally purify the sample. 

Sample heterogeneity can be caused by the 
presence of contaminants. These are usually sim- 
ple to identify by sample quality control, for 
instance by size-exclusion chromatography, 
Coomassie-stained SDS PAGE gels and mass 
spectrometry. Similarly, the sub-stoichiometric 
presence or absence of subunits/interaction 
partners in macromolecular complexes is a source 
of sample heterogeneity. This type of heterogene- 
ity usually can be addressed by a clever choice of 
tags for affinity purification fused to minority 
components of a complex, improved protein 
expression systems and/or additional purification 
steps. 


EMD-8829 am EMD-9575 


70S ribosome EF-Tu 
Zika virus 
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properties in terms of architecture or cellular location. 
Only structures with a resolution better than 4 Å resolution 
were considered. One example is presented for each cate- 
gory with EMDB entry number (EMD-8194 [4], 
EMD-9539 [91], EMD-9575 [92], EMD-8773 [93], 
EMD-8847 [94], EMD-8829 [95]) 
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A second source of heterogeneity, which is 
more difficult to address, is the dynamics of mac- 
romolecular complexes. In fact, macromolecular 
machines often undergo a number of conforma- 
tional changes to perform their function. Such a 
sample then consists of a mixture of different 
conformational states. This leads to limited 
resolution of the resulting cryo-EM map if the 
different conformations cannot be sorted out bio- 
chemically or computationally. 

Sample heterogeneity caused by flexibility/ 
dynamics of the sample cannot be detected read- 
ily by standard protein biochemistry. Sample flex- 
ibility has to be revealed by additional quality 
control of the sample, such as dynamic light scat- 
tering or negative stain EM analyses (Fig. 12.2). 
Negative stain EM provides high contrast images 
and therefore allows assessment of contaminants, 
sample concentration and homogeneity [8]. 2D 
class averages from negative stain EM can reveal 
different conformations of a sample. Negative 
stain EM analyses can also indicate if the sample 
adopts one preferred conformation or if a specific 
conformation of the sample can be stabilized, 
e.g., by addition of conformation-specific 
antibodies, inhibitors, substrate analogues and/or 
mutagenesis. These approaches are adopted from 
X-ray crystallography where the crystal lattice 
further purifies the proteins and forces the sample 
to adopt one specific conformation [9]. 

State-of-the-art image processing can deal 
with conformational flexibility of samples in 
cases where the sample adopts a series of discrete 
states. It is then possible to solve the cryo-EM 
structures of the different states. To date, 
maximum likelihood-based classification is com- 
monly used in cryo-EM to sort for differences in 
composition and/or conformation in the sample 
[10, 11]. It identifies structurally homogeneous 
subset(s) of particles and leads to 3D 
reconstructions that can be refined to higher reso- 
lution. However, 2D and 3D classification cannot 
resolve conformational flexibility which is due to 
continuous motion of parts of the sample. Cur- 
rently, very flexible parts of the structure are 
masked out, and the structure determination is 
then limited to the rigid parts of the sample 
which adopt a defined state [12, 13]. However, 
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masking out more flexible, dynamic parts of 
macromolecules leaves them poorly defined, 
requiring new biochemical approaches and addi- 
tional manipulations to improve sample homoge- 
neity to be able to visualize such flexible regions 
at high-resolution. 

Here, we review innovative methodological 
approaches to improve cryo-EM sample quality, 
in particular sample stability and homogeneity 
(Fig. 12.2). Biochemical methods to analyze and 
optimize sample purity and stability are discussed 
in detail. In addition, we focus on existing tools 
and new developments for cryo-grid preparation, 
a critical and often underestimated step in 
cryo-EM. We place special emphasis on mem- 
brane proteins and their complexes which are 
particularly difficult to prepare and characterize 
structurally. Notably, high-resolution structures 
of several biologically important, much sought- 
after membrane protein complexes, which 
previously could not be solved by crystallogra- 
phy, were recently determined by cryo-EM 
(Fig. 12.3). We highlight new trends in mem- 
brane protein sample preparation for cryo-EM, 
which contributed to a wealth of novel high- 
resolution cryo-EM structures. 


12.2 Approaches for High-Quality 
Sample Preparation 


At first sight, sample preparation for single parti- 
cle cryo-EM is straightforward and much simpler 
compared to crystallography or nuclear magnetic 
resonance spectroscopy (NMR) as neither 
crystals nor labeling is required. However, 
samples must be imaged in a high vacuum in 
EM, which is incompatible with biological 
samples in aqueous buffer solutions. Vitrification, 
which was first established by Jacques Dubochet 
(Nobel prize in Chemistry 2017), overcomes this 
limitation of EM by flash-freezing the sample and 
imaging at liquid nitrogen temperatures [14]. In 
practice, a small volume of sample is applied onto 
a carbon-coated grid, made from a metal mesh 
such as copper, gold, nickel, molybdenum, rho- 
dium or tungsten with different mesh sizes 
(200-400 mesh grids are commonly used) and 
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Fig. 12.2 Schematic overview of a typical cryo-EM 
sample preparation workflow. The properties of the 
specimen have to be considered first to identify an appro- 
priate expression system to produce structural biology- 
grade material. Then, purification and quality control 
(QC) by negative stain EM is carried out to verify sample 
homogeneity and stability. Often, multiple rounds of puri- 
fication optimization and QC using the techniques 
discussed in the main text are required. For homogeneous 


ta ol as 
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samples, freezing conditions that yield an ideal ice thick- 
ness must be determined. This can be achieved by varying 
sample concentration, grid type, grid pre-treatments, appli- 
cation/plunging method, and buffer conditions (see main 
text). Screening of the cryo-grids prepared under different 
conditions will indicate if further optimization is required 
before continuing with data acquisition. Subsequent data 
analysis (including 2D class averaging, initial model gen- 
eration, 3D classification) could lead to a satisfactory 
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Fig. 12.3 Structural analysis of membrane proteins. 
(a) Numbers of unique membrane protein structures 
published by year (total of 796 PDB entries). Inset: contri- 
bution of cryo-EM to the unique membrane protein 
structures (97 entries). (b) Outer ring: Classification of 
97 unique cryo-EM structures of membrane proteins (all 
published since 2013, all with <10 A resolution) 
according to their biological function. Examples of mem- 
brane protein cryo-EM structures are shown for each 


with a thin carbon support film on one side. 
Subsequently, most of the sample is blotted 
away with filter paper, and then the grid with a 
thin aqueous film of the sample is flash frozen in 
liquid ethane. The freezing must occur at high 
speed (faster than 10* K/sec) to avoid formation 
of ice crystals and to favor formation of amor- 
phous, vitreous ice [15]. Vitrification prevents 
biological specimens from drying out in the 
high vacuum of the EM, preserves the native 
structure of the macromolecules and keeps them 
immobilized during imaging. 

It can be rather time-consuming and require 
significant amounts of samples to identify optimal 
grid-freezing conditions: Cryo-grids suitable for 
data collection require good particle concentra- 
tion and distribution in thin ice. The ideal ice 


category with EMDB entry number (EMD-3061 [23], 
EMD-6224 [43], EMD-3245 [44], EMD-8410 [45], 
EMD-8653 [46], EMD-7325 [47], EMD-3951 [48]). The 
inner ring represents unique structures determined by all 
methods, including crystallography and NMR (796 in 
total), using the same colors for the classes. Unclassified 
structures are shown in white. Data taken from Stephen 
White’s webpage ‘membrane proteins of known 3D struc- 
ture’ (http://blanco.biomol.uci.edu/mpstruc) 


thickness is just a bit thicker than the 
macromolecules embedded, ensuring the best 
possible signal-to-noise ratio (Fig. 12.2). The 
contrast/signal of macromolecules in cryo-EM is 
generally low because it is mostly provided by C, 
O, N and H atoms which are also present in 
the solvent. Due to the similar density of 
macromolecules and solvent, it is easier to 
visualize very large macromolecular complexes 
in vitreous ice. 


12.2.1 Selecting the Optimal 


Expression System 


In the past, purification of macromolecular 
complexes was mostly performed from native 


i 


Fig. 12.2 (continued) reconstruction. However, it may 
also indicate that the ice conditions are not ideal or that 
the sample or cryo-grid preparation requires further 


optimization. Once a suitable 3D reconstruction of the 
sample is obtained from image processing the final steps 
of atomic model building and validation can begin 
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sources [16]. For Mega Dalton-sized complexes 
such as ribosomes and their complexes, it is still 
the only viable preparation method (Fig. 12.1b). 
Bacterial expression systems, in particular 
Escherichia coli, remain the preferred sources of 
individual recombinant proteins and smaller 
complexes because they are simple, fast and 
cost-effective [17]. More “difficult to express” 
eukaryotic multi-domain proteins or large macro- 
molecular complexes usually require eukaryotic 
expression systems, the most popular being Sac- 
charomyces cerevisiae, baculovirus-insect cell 
expression and mammalian cell-based systems 
[18, 19]. 

The optimal expression system is defined by 
specific features of the sample. These may 
include sequence and size of the proteins as well 
as activity, folding (the requirement of specific 
chaperones) and post-translational modifications 
(PTMs). PTMs, such as acetylation, phosphoryla- 
tion, addition of lipids or carbohydrates, can 
be essential to protein activity and can alter 
the proteins’ physicochemical nature. Moreover, 
overexpression of certain proteins can be 
tolerated in one system but be toxic in another. 
While S. cerevisiae is the least expensive and the 
most convenient to use eukaryotic protein expres- 
sion system, baculovirus-insect cell systems have 
a more sophisticated chaperone machinery for 
folding mammalian proteins and can achieve 
more complex PTMs. For mammalian proteins, 
the proper conformation and modifications are 
most likely to be achieved by expression in mam- 
malian cells. More distant expression systems 
may provide higher yields, but not add all PTMs 
which are found in the native protein in the cell. 
Therefore, overexpression in the endogenous or a 
closely related host organism is usually the pre- 
ferred approach for production of a correctly 
assembled, fully active protein complex. 

Examples for “difficult to express” proteins 
include eukaryotic kinases and membrane protein 
complexes: Human SMG1 kinase (~420 kDa), a 
member of the phosphatidylinositol 3-kinase- 
related kinase family involved in genome stability 
and nonsense-mediated mRNA decay, can be 
expressed in its active, auto-phosphorylated 
form in human HEK293T cells. Its yields are 


A. Deniaud et al. 


further increased when the interacting protein 
SMG9 is co-expressed [20, 21]. In contrast, 
SMGI1 overexpression using baculovirus-insect 
cell expression yielded only insoluble, aggregated 
proteins (unpublished). Eukaryotic membrane 
protein complexes often require the presence of 
specific lipids and glycosylation for stability and 
activity which is difficult to achieve in recombi- 
nant expression systems: Human y-secretase, 
comprising four subunits each with a transmem- 
brane domain, was produced in human HEK293F 
cells by co-expression of all four subunits from a 
single expression vector [22, 23]. In the resulting 
high-resolution cryo-EM structure of y-secretase, 
lipids were identified bound to specific sites on 
the transmembrane domains of the proteins and 
eleven glycosylated residues were present in the 
nicastrin subunit as judged by the EM density 
(Fig. 12.4) [23]. 


12.2.2 Sample Quality Control 
and Buffer Optimization 


Biochemical purification procedures traditionally 
use buffers that mimic the cellular environment of 
the sample, i.e., a close-to physiological pH and 
additives such as glycerol or sucrose to enhance 
stability. However, glycerol, sucrose and 
detergents are known to interfere with cryo-grid 
preparation as they reduce the contrast of the 
sample in vitreous ice. To obtain optimal cryo- 
EM data for a purified specimen it is therefore 
advised to optimize its buffer and assess the need 
for additives. 

For crystallography, protein homogeneity is 
usually assessed under multiple conditions by 
dynamic light scattering [24] or by Thermofluor/ 
differential scanning fluorimetry (DSF) [25]. For 
Thermofluor/DSF analysis, a dye is added to the 
sample which is quenched in aqueous solution 
and becomes hyper-fluorescent in an apolar envi- 
ronment, e.g., when exposed to hydrophobic 
residues of a protein. However, DSF is not suit- 
able for analysis of large complexes because 
it measures protein stability as the midpoint of 
the unfolding transition, assuming cooperative 
unfolding. This assumption is not valid for most 
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Fig. 12.4 Low CMC and detergent-free systems for 
cryo-EM analysis of membrane proteins. (a) Chemical 
structure of LMNG and cryo-EM structure of agonist- 
bound adenosine A24 receptor in LMNG (red, agonist: 
black), heterotrimeric G protein (yellow: mini-Gsg, 
orange-red: By subunits) and nanobody (blue) at 4.1 A 
resolution [33]. (b) Schematic illustration of amphipol 
A8-35 [51] and cryo-EM structure of y-secretase with 


complexes as they usually do not unfold coopera- 
tively, but in a multi-state process of complex 
dissociation and unfolding of the subunits. 

A DSF-based method, named ProteoPlex, 
screens for conditions of monodispersity and 
increased stability of large macromolecular 
complexes [26]. ProteoPlex comprises a software 
for DSF data analysis in order to identify buffer 
conditions leading to a single melting temperature 
transition of a complex [26]. Switching from 
multiple unfolding transitions to a single transi- 
tion is a sign of sample monodispersity, and an 
increased melting temperature indicates better 
sample stability. To achieve this, a large panel 
of buffer conditions can be screened: pHs, salt 
concentration, diverse additives, as well as 


C C, 
-0 So m” So 


179 


Nanodisc 


Amphipol A8-35 


a ii a 


w EMD-8118 


amphipol [23]. The four subunits are depicted in different 
green colors. Representative glycan and lipid residues 
(dark blue) are labeled. (c) Schematic illustration of a 
nanodisc with an embedded integral membrane protein 
[54] and cryo-EM structure of TRPV1 channel in a 
nanodisc [96]. The four subunits are depicted in pink- 
purple colors; lipids (dark blue) are labeled. All EM 
densities are depicted in transparent grey 


available specific ligands, inhibitors, nanobodies 
or antibody fragments. 

Notably, the pH optima identified using DSF 
often differ significantly from physiological pH 
(7.0-8.0); they range from pH 4 to 9 and peak 
around pH 6.5 [27]. Most cryo-EM structures 
found in the EMDB have been solved at a pH 
between 7 and 8.5, indicating that the stability of 
these specimens was not assessed systematically. 
Likely, these structures were determined at 
non-optimal buffer conditions. 

Based on the analyses of ~80 macromolecular 
complexes, Chari and Stark predicted that the best 
buffer conditions for crystallization and cryo-EM 
of macromolecules are likely to be very similar 
[26, 27]. For example, using pH 6 instead of pH 8 
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led to a five-fold increase in purification yields of 
AAA+ ATPase p97. In negative stain EM intact, 
homogenous oligomers of p97 were observed 
rather than partial aggregates which were 
obtained from previous purifications [26]. This 
exemplifies that ProteoPlex is very useful for not 
only sample quality control and optimization of 
buffer conditions for EM analyses, but also 
improved purification under conditions where 
the complex is most stable. 


12.2.3 Improving Size and Stability: 
Nanobodies and Fusion 


Proteins 


Cryo-EM is challenging for small proteins 
(<100 kDa), in particular small membrane 
proteins. Because of the difficulty in crystallizing 
membrane proteins or solving their structure by 
NMR (699 unique structures since 1985), cryo- 
EM provides an attractive alternative (Fig. 12.3a). 
97 unique cryo-EM structures of membrane 
proteins from different classes were solved since 
2013 (Fig. 12.3b), demonstrating the power of 
the technology. One means to overcome the 
problems associated with a small sample is to 
increase the protein size, for instance with the 
help of specific Fab antibody fragments (Fabs) 
or nanobodies (Fig. 12.4a). Nanobodies consist 
of a single monomeric variable antibody domain 
and have a molecular weight of ~15 kDa. 

For example, the cryo-EM structure of 
a ~130 kDa heterodimeric ATP binding cassette 
(ABC) transporter TmrAB from Thermus 
thermophilus has been reported at ~8 A resolution 
using a specific 50 kDa Fab fragment [28]. The 
Fab fragment not only increased size but also 
aided the determination of particle orientations 
[28]. Using the same strategy, the cryo-EM struc- 
ture of the HIV integrase dimer (65 kDa) in com- 
plex with two specific Fabs (50 kDa each) was 
reported at 10 A resolution [29]. 

Nanobodies are commonly used for purifica- 
tion and detection of proteins. Moreover, when a 
complex is flexible and fails to achieve high- 
resolution EM reconstructions, a very attractive 
strategy is to use antibodies/nanobodies that 
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stabilize a specific conformation. For instance, 
using a specific nanobody, a 8.7 Å cryo-EM 
reconstruction has been achieved for the A/V- 
type ATPase rotary motor from Thermus 
thermophilus [30]. The presence of the nanobody 
improved the local resolution of the map, i.e., the 
rigidity of the region where it bound to the 
ATPase. Similarly, nanobodies have been used 
to analyze or reduce the flexibility of G protein- 
coupled receptors (GPCRs) as well as to increase 
the size of the sample (Fig. 12.4a) [31-33]. 
Another promising approach to increase the 
size of the protein of interest for cryo-EM 
includes fusion of the protein of interest to a 
well-ordered, symmetric oligomer. For instance, 
maltose-binding protein (MBP, 40 kDa) was 
covalently linked to glutamine synthetase (GS) 
which forms dodecamers [34]. The resulting 
cryo-EM structure yielded 4 Å resolution for GS 
and 6-10 A for MBP [34]. For this, the linker 
between the two proteins had to be optimized to 
minimize the flexibility of the fused MBP. The 
resolution was lowest at the junction between the 
proteins. Consequently, this approach may be 
limited by the requirement to optimize the linker 
for each new protein fusion, thus minimizing 
flexibility and enforcing 12-fold symmetry [34]. 


12.2.4 Improving Stability: 


Polyproteins and Crosslinking 


Polyproteins can help overcome the problem of 
unstable complexes that dissociate during purifi- 
cation or grid preparation. Polyproteins are cova- 
lently linked individual proteins and are used in 
nature by many viruses for most efficient protein 
production. For structural biology they have been 
used to achieve the correct stoichiometry of 
proteins forming a complex and to stabilize the 
resulting complexes [35, 36]. The cryo-EM struc- 
ture of the E. coli ribosomal targeting complex in 
the ‘closed’ conformation critically relied on 
stabilizing the complex between the signal recog- 
nition particle (SRP) and its receptor FtsY. This 
was achieved by covalently linking the SRP pro- 
tein Ffh with FtsY into a single polypeptide chain 
without compromising activity [37]. 
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Alternatively, purified complexes can be 
stabilized by mild chemical crosslinking. The 
simplest approach is to use batch reactions. 
The optimal concentration of crosslinker is deter- 
mined in SDS-PAGE as the minimal concentra- 
tion required for a complete shift of all subunits of 
a complex into a higher-molecular weight band. 
For instance, the lysine-specific crosslinker BS3 
(bis(sulfosuccinimidyl)suberate) was 
fully used to stabilize Target of Rapamycin Com- 
plex 2 (TORC2) which was purified from source. 
The corresponding cryo-EM structure of TORC2 
reached 7.9 A resolution [38]. 

GraFix is a more elaborate method to combine 
separation by size in a density gradient centrifu- 
gation with mild chemical crosslinking using glu- 
taraldehyde [39]. About 100 ug sample are loaded 
onto a glycerol and glutaraldehyde gradient. 
During centrifugation the sample is exposed to 
increasing concentration of the glutaraldehyde 
crosslinker and simultaneously size-fractionated 
in the density gradient. This favors intramolecular 
crosslinks and removes any complexes with inter- 
molecular crosslinks due to their increased size. 
For cryo-EM, glycerol is removed from the buffer 
using a desalting column [39]. ‘On-column’ 
crosslinking follows a similar principle: B> adren- 
ergic receptor-G protein complexed with 
B-arrestin was crosslinked during size-exclusion 
chromatography, leading to enrichment of a 
‘tight’ conformation [32]. 

It should be noted that chemical crosslinking 
leads to reduced conformational heterogeneity, 
favoring the most compact state of the sample. 
Sometimes, the resulting compact structure does 
not represent the conformation the sample usually 
adopts in solution. It is therefore advisable to 
compare the crosslinked and native sample, e.g., 
using negative stain EM. 


success- 


12.2.5 Stabilizing Membrane Proteins 


Since 2013, structures of many important mem- 
brane protein complexes were determined by 
cryo-EM [40] (Fig. 12.3b). Ground-breaking 
examples are the transient receptor potential chan- 
nel TRPV1 [41], human y-secretase [22, 23] 
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(Fig. 12.4b, c), and the 1 MDa 
mammalian NADH:ubiquinone oxidoreductase 
(respiratory complex I) [42]. Examples of high- 
resolution cryo-EM structures exist for virtually 
all classes of membrane proteins (Fig. 12.3b): 
Enzymes, photosystems, pore-forming channels, 
secretion systems, receptors, ion channels and 
transporters [23, 43—48]. Many of the high- 
resolution membrane protein structures addition- 
ally show tightly bound lipids, metal ions, glycan 
residues and bound ligands (Fig. 12.4). 


12.2.5.1 New Amphiphilic Detergents 
Solving high-resolution structures of membrane 
protein complexes requires the latest technology 
and optimized sample preparation techniques, 
including the choice of suitable detergents. To 
date, it is unclear which detergents work best 
for cryo-EM studies of membrane proteins. 
Analyzing recent cryo-EM structures of mem- 
brane proteins and detergents utilized for 
solubilization, indicates that a large number of 
detergents are compatible with cryo-EM; includ- 
ing n-Dodecyl-B-Maltoside, Cymal7, Digitonin 
and Tween-20 [49]. For cryo-EM, detergents 
with low critical micellar concentration (CMC) 
such as Lauryl Maltose Neopentyl Glycol 
(LMNG) and detergent-free systems (amphipols, 
nanodiscs, SMA or saposins) became very popu- 
lar in recent years (Fig. 12.4). 

High detergent concentrations (above the 
CMC) are required for solubilization and purifi- 
cation of membrane proteins. These high 
concentrations often prevent thin ice formation 
during grid preparation. Moreover, the detergent 
micelles around membrane proteins diminish the 
contrast in vitreous ice. Finally, free detergent 
micelles can be mistaken as protein particles, 
complicating image processing. 

To address the need for high detergent concen- 
tration, GraDeR, a variation of GraFix, has 
been established [50]. Two inverse gradients are 
used for ultracentrifugation, with increasing 
concentrations of glycerol and decreasing LMNG 
concentrations. The LMNG structure resembles a 
lipid with two aliphatic chains (Fig. 12.4a). LMNG 
has a very low CMC (in H20 ~0.01 nM/0.001% 
w/v) and an extremely slow off-rate, allowing to 
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temporarily lower the LMNG concentration below 
the CMC without observing protein aggregation. 
Centrifuging membrane proteins into LMNG 
concentrations below the CMC avoids the pres- 
ence of detergent micelles and facilitates grid prep- 
aration [50]. As a proof of principle, GraDeR was 
successfully applied to bacterial V-type ATPase, a 
Caenorhabditis elegans gap junction channel and 
mammalian F,F, ATPase, significantly improving 
the quality of the cryo-grid preparations [50]. 
Similarly, LMNG alone has facilitated cryo- 
EM sample preparation. For instance, the cryo- 
EM structure of the adenosine A>, receptor 
bound to an agonist (NECA), a heterotrimeric G 
protein (comprising an engineered mini-Gg and 
By subunits) and nanobody Nb35 was solved in 
LMNG at 4.1 A resolution [33] (Fig. 12.4a). 


12.2.5.2 Amphipathic Polymers: 

A Substitute for Detergent 
Amphipols (amphipathic polymers) have been 
designed to stabilize integral membrane proteins 
[51]. They are constituted of polyacrylate 
polymers which are very soluble due to hydro- 
philic carboxylate groups in their side chains 
(Fig. 12.4b). Hydrophobic side chains bind to the 
hydrophobic surface of membrane proteins and 
thus stabilize the membrane protein’s native 
structure [51]. Because of multiple associations 
between the amphipols and the membrane protein, 
amphipols can replace detergents completely, 
avoiding the problems associated with detergent 
micelles. 

In the case of y-secretase, replacing the deter- 
gent digitonin by the amphipol A8—35 was criti- 
cal for achieving a high-resolution cryo-EM 
structure [22, 23] (Fig. 12.4b). Other examples 
include the 3.9 A resolution cryo-EM structure of 
the membrane-embedded part of Vacuolar-type 
ATPase from yeast [52] and a 3.4 A cryo-EM 
structure of polycystic kidney disease-like chan- 
nel PKD2L1 [53] which were both achieved 
using amphipols. 


12.2.5.3 Nanodiscs Mimic the Lipid 
Bilayer 

Removing proteins from their natural lipid envi- 

ronment with detergents results in loss of native 
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interactions, e.g., with important lipid molecules 
and other membrane proteins. This can lead to 
dissociation of membrane protein complexes. 
Moreover, detergents do not properly mimic the 
membrane because the detergent micelle forms a 
spherical vesicle and not a bilayer. In contrast, 
nanodiscs offer a native-like lipid bilayer environ- 
ment [54]. Nanodiscs are composed of two 
identical membrane scaffolding proteins, MSPs, 
which each contain an amphipathic helix forming 
a belt around the hydrophobic regions of the lipid 
bilayer (Fig. 12.4c). The diameter of the nanodisc 
is determined by the length of the MSP helix, 
which can be varied and range from 6 to 17 nm. 
By optimizing the lipid, membrane protein and 
MSP protein ratio, it is possible to reconstitute 
membrane proteins in nanodiscs and to remove 
all detergents [54]. 

The first membrane protein cryo-EM 
structures in nanodiscs were the bacterial SecYE 
translocon in complex with the ribosome [55], the 
TRPV1 channel [41] (Fig. 12.4c) and the RyR1 
ryanodine receptor [56]. Notably, other high res- 
olution RyR1 cryo-EM structures were solved 
using detergent (Tween-20) [57] and mixed 
lipid-detergent micelles (Chaps/lipids) [58], and 
they are virtually identical to the RyR1 structure 
in nanodiscs [56]. This suggests that this mem- 
brane protein complex is particularly stable and 
compatible with various solubilization methods. 


12.2.5.4 Saposin-Based Nanoparticles 
and Styrene-Maleic Acid (SMA) 
Copolymers 
Detergent-free saposin-lipoprotein nanoparticles 
(‘Salipro’) are composed of lipids and saposin 
[59]. Saposin-like proteins are highly stable due 
to amphipathic helices and disulfide bridges 
[60]. Saposin A surrounds the lipids and the 
transmembrane domains of the membrane protein 
and thus keeps them in solution. The Salipro 
system is adaptable to various sizes of proteins. 
For example, Saposin has been used to solve the 
cryo-EM structure of the bacterial PeptTSo2 pep- 
tide transporter at 6.5 A resolution [59]. 
SMA copolymers comprise styrene and maleic 
acid groups. These polymers are remarkable 
because they have been shown to solubilize 
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lipid-protein complexes directly from raw 
membranes or even cells [61]. Thus, SMAs 
avoid the detergent-solubilization step which is 
inherent to all strategies described above and 
which potentially disturbs the native membrane 
protein structure. This makes this copolymer par- 
ticularly attractive for membrane protein 
preparations and structural biology. Recently, 
SMA lipid particles (SMALPs) have been used 
to solve the 3.4 A cryo-EM structure of the bacte- 
rial alternative complex IM [62]. The complex 
participates in electron transport by catalyzing 
the oxidation of membrane-bound quinol and 
the reduction of cytochrome c [62]. 


12.3 Cryo-grid Preparation and Ice 
Thickness 


Successful cryo-grid preparation depends on 
many parameters, including the choice of grid, 
pre-treatment of grids, blotting conditions, the 
nature of the sample and buffer composition. In 
practice, grid preparations need to be optimized 
for new samples, and often many grids are 
screened to identify suitable areas on the grid for 
data collection. As discussed above, the presence 
of detergents, glycerol or sucrose in the buffer is 
known to cause problems with ice quality 
[15]. For instance, detergents lower the surface 
tension and make it more difficult to produce thin 
ice. Holey carbon grids with smaller holes some- 
times help to overcome this problem [63]. 

Other parameters that affect ice thickness 
include the blotting time and the post-blot incu- 
bation time before the grid is flash-frozen in liq- 
uid ethane. These parameters can be tightly 
controlled with semi-automated blotting and 
plunging devices. Moreover, the filter paper, that 
is used to remove excess fluid from the cryo-grid, 
can touch the grid from both sides, from the 
sample application side or from the ‘back’ side. 
A longer incubation time before freezing the grid 
allows water to evaporate; the extent of evapora- 
tion depends on the temperature and humidity of 
the environment. Notably, evaporation also 
changes the buffer composition (pH and ionic 
strength) which can affect the protein structure. 
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The sample support film (e.g., homemade 
holey carbon film or commercial C-flat/Quantifoil 
holey carbon film), the size of the holes and 
its hydrophilicity are important parameters for 
grid preparation. Sometimes, remaining plastic 
on holey carbon film has to be removed 
using ethyl acetate [63] or using other protocols 
involving chloroform, acetone and/or ethanol. 
Hydrophilicity of the support film is required to 
allow complete wetting of the grid and formation 
of the ice layer. Hydrophilicity can be modified 
by exposure to UV radiation, glow discharge, 
plasma cleaning, poly-L-lysine or detergent treat- 
ment [64]. The extent of partition of the sample 
into the grid holes depends on the hydrophilicity 
of the support. Macromolecules often preferen- 
tially adsorb on the carbon film. This dilutes the 
concentration of particles in the holes. One solu- 
tion is to apply the sample twice on the grid, 
hoping that the first application saturates the 
carbon film. 

Affinity grids use a specimen-specific support 
film to immobilize particles, leading to a higher 
concentration of particles on the film and avoiding 
the interaction of particles with the water-air inter- 
face. Such functionalized surfaces include for 
example Ni-NTA-derivatized monolayers [65], 
antibody-coated carbon films [66] or crystalline 
streptavidin monolayers [67]. Of these, Ni-NTA 
and streptavidin-coated lipid monolayers were 
reported to lead to preferred orientations of 
particles on the grid. 

When particles interact with the water-air 
interface they often adopt a preferred orientation 
or even denature, compromising structure deter- 
mination. Approaches to overcome preferred ori- 
entation of particles include the addition of low 
amounts of detergents which concentrate at the 
water-air interface and thus protect proteins from 
the surface effects. Changing the glow-discharge 
parameters or any other treatment that affects the 
hydrophobicity of the support also can change 
particle distribution and orientation and thus can 
help to improve the particle view distribution in 
vitreous ice [63]. Collecting images of tilted spec- 
imen is the least preferable solution to the pre- 
ferred view problem as it often prevents reaching 
high resolution. 
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New Grid Materials to Reduce 
Particle Movement 


12.3.1 


Particle movement during irradiation of the 
specimen is caused by stage drift and/or beam- 
induced charging. In order to restore high- 
resolution information, motion correction is 
applied during image processing [68]. Motion 
correction showed that commonly used amor- 
phous holey carbon films are not very stable 
during irradiation, as they were deformed and 
were poor conductors [69]. 

Consequently, new support materials were 
tested to identify films that are more stable during 
imaging. Graphene and gold films were shown to 
strongly reduce or nearly eliminate sample move- 
ment [69, 70]. Gold (Au) grids with a gold sup- 
port film gave best results which is likely due to 
gold being highly conductive and very radiation- 
hard, more resistant to radiation-caused 
alterations [69]. Accordingly, Au grids are now 
commercially available and commonly used for 
high-resolution cryo-EM. Recent examples of 
their application include the 3.85 A cryo-EM 
structure of ASCT2, a human neutral amino 
acid transporter [71] and the human HCN1 
hyperpolarization-activated channel at 3.5 A 
resolution [72]. 

Notably, the most pronounced particle move- 
ment occurs within the exposure to the first 3—5 
electrons per A* [69]. Unfortunately, it is cur- 
rently not possible to computationally correct for 
this movement. Therefore, the corresponding 
movie frames are discarded, despite the fact that 
these frames contain the most valuable high- 
resolution information as they are least damaged 
by irradiation. In the future, better grid supports 
which even further limit particle movement, com- 
bined with new software may allow processing of 
these images and thus could lead to a further step 
change in resolution achievable by cryo-EM. 


i.e., 
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12.3.2 Cryo-grid Preparation: From 
Pipetting-Blotting-Plunging 
Devices to New Automated 
Cryo-Grid Preparation 
Procedures 


Optimizing cryo-grid preparation can involve 
screening sample concentrations, buffer optimi- 
zation, different grid types and blotting 
conditions. Blotting-plunging robots (such as 
FEI Vitrobot, Leica GP, Gatan CP3) have been 
developed to make cryo-grid freezing more repro- 
ducible (see above). However, blotting with filter 
paper removes >99.9% of the sample, and it can 
lead to sample aggregation or denaturation. 
Moreover, the process of grid screening is 
low-throughput, leaving this process rather 
unpredictable. To address this bottleneck, new 
cryo-grid preparation devices are developed, 
such as ‘Spotiton’ [73], a Microsprayer Chip 
[74] and ‘CryoWriter’ [75]. 

Spotiton is a piezoelectric inkjet dispensing 
device, incorporated into a purpose-built 
vitrification robot [73]. It spots a very small vol- 
ume (~2—16 nl) of sample onto a novel self-blot- 
ting grid on which the sample is spread to a thin 
film, avoiding the use of filter paper [76]. Self- 
blotting grids are composed of copper and rho- 
dium (Cu/Rh) or palladium (Cu/Pd) and are cov- 
ered with nanowires on the copper surface. The 
opposite side (Rh or Pd) is smooth (no nanowires) 
and holds the support film, holey carbon or gold. 
The sample is spotted with the dispensing device 
onto the nanowire side of the support and spreads 
across the support [76]. The nanowires act as 
blotting paper, rapidly removing the sample 
when it comes in contact, and leaving a thin liquid 
film which subsequently is flash-frozen. 

Spraying-plunging methods use microfluidic 
devices with  silicon—/polydimethylsiloxane- 
based chips [74, 77]. A chip comprises a 
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micromixer with two inlets for the reactants (orig- 
inally built for time-resolved EM) and a micro- 
reaction channel from which the reaction mix is 
sprayed onto the grid. This device can also be 
used for normal samples, rather than reaction 
mixtures. By adjusting gas pressure and the dis- 
tance between the sprayer and the grid, the ice 
thickness can be controlled. Currently, rather 
large sample volumes (9 ul/grid) are required. A 
proof-of-principle study with apoferritin showed 
that cryo-grids prepared with the microsprayer 
can lead toa3 A cryo-EM structure [74]. 

The CryoWriter avoids the blotting and the 
spraying step. CryoWriter uses a microcapillary 
to directly apply and spread a very small volume 
of sample (~3—20 nl) onto the grid, such that the 
drop spans from the tip to the grid surface 
[75]. During application, the grid is moved to 
spread the sample over the surface and to fill the 
holes in the support film. Real-time monitoring of 
the thickness of the sample film allows to adjust 
the volumes applied and to change the sample 
film thickness prior to flash-freezing [75]. Sample 
thinning is achieved by controlled water evapora- 
tion. This is possible because the device is not 
humidity-controlled. When the monitoring sys- 
tem (a laser diode and a photo detector) indicates 
that the desired liquid film thickness is reached, 
the grid is rapidly vitrified using a robotic arm. 


12.4 Imaging In-Focus: The Volta 


Phase Plate 


The use of phase plates for imaging by electron 
microscopy in-focus was already suggested in 
1947 by Hans Boersch [78]. For light micros- 
copy, Fritz Zernike realized in-focus imaging. 
He developed a phase plate that shifts the phase 
of scattered light by 90° relative to the phase of 
un-scattered light [78]. For cryo-EM, in-focus 
imaging has only recently been achieved: 
Radostin Danev and colleagues discovered that 
a charged carbon film can be used to introduce the 
required phase shift for scattered electrons 
[79]. This “Volta Phase Plate’ yields an impres- 
sive enhancement of contrast for in-focus images 
[80]. The introduction of the phase plate is now 
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rapidly transforming single particle cryo-EM and 
cryo-tomography [80, 81]. Contrast-enhanced 
cryo-EM using the Volta Phase Plate is particu- 
larly interesting for small proteins, including 
small membrane proteins. For example, 64 kDa 
hemoglobin was solved at 3.2 A resolution [82] 
and several GPCR complexes in the range of 
150 kDa were reported. Examples include 
agonist-bound calcitonin receptor with G, protein 
at 4.1 A resolution [83] and agonist-bound gluca- 
gon-like peptide-1 receptor complex at 
3.3 A [84]. 


12.5 Challenges and Outlook 


Due to technological advances in recent years cryo- 
EM became a more automated, higher throughput 
and more routine technique, allowing high- 
resolution structural biology. Future advances will 
include next-generation direct electron detectors 
with higher sensitivity and faster readout as well 
as developments in image processing software 
addressing the problems associated with the initial 
movie frames. In parallel, new tools and routines 
for improved sample production and cryo-grid 
preparations are required. In fact, sample quality 
is the bottleneck in many projects. 

In particular, structures of small proteins and 
membrane protein complexes can now be solved 
by cryo-EM [82-84]. The challenges for mem- 
brane protein samples are multi-fold, including 
the choice of expression system to produce suffi- 
cient amounts of protein/complex, mild but effi- 
cient solubilization using detergents or detergent- 
free systems, purification of homogeneous protein 
sample and last but not least cryo-grid preparation 
(Fig. 12.2). It is key to find the best detergent and 
buffer conditions where the protein/complex is 
stable. If sufficient sample is available, buffer opti- 
mization is ideally achieved using high-throughput 
screening techniques, such as ProteoPlex [26]. Cur- 
rently, there is no general answer to the biochemi- 
cal problem of how to stabilize unstable and 
flexible specimens. Possible solutions, including 
antibody fragments [28, 29] or nanobodies [31], 
polyproteins [36, 37] or mild chemical cross- 
linking [38, 39] are discussed above. Establishing 
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a protocol for cryo-grid preparation is a challenge 
which currently is addressed by new grid materials 
[69, 70, 76] and novel sample application 
approaches [73-75] substituting for the pipetting- 
blotting-plunging technique. 

For membrane proteins, the optimal solution to 
many of the problems associated with protein 
extraction from the membrane would be in situ 
analysis at high-resolution in their native mem- 
brane. In fact, Electron cryo-Tomography 
(cryoET) underwent a resolution revolution as 
well, thanks to direct electron detectors, Volta 
phase plate and improved sample preparation 
techniques, such as cryo-focused ion beam (FIB) 
milling [85]. Similarly, improved software for 3D 
classification and subtomogram averaging tools 
contribute to increasing resolution of tomograms 
[86-88]. CryoET analyses of the ~56 MDa core 
of the nuclear pore complex at ~21 A resolution 
[89, 90] impressively highlight the great potential 
of this technique. 
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Abstract 


Recent advancements in cryo-electron micros- 
copy (cryo-TEM) have enabled the determina- 
tion of structures of macromolecular complexes 
at near-atomic resolution, establishing it as a 
pivotal tool in Structural Biology. This high 
resolution allows for the detection of ligands 
and substrates under physiological conditions. 
Enhancements in detectors and imaging 
devices, like phase plates, improve signal qual- 
ity, facilitating the reconstruction of even 
smaller macromolecular complexes. The 
100-kDa barrier has been surpassed, presenting 
new opportunities for pharmacological research 
and expanding the scope of crystallographic 
analyses in the pharmaceutical industry. Cryo- 
TEM produces vast data sets from minimal 
samples, and refined classification methods 
can identify different conformational states of 
macromolecular complexes, offering deeper 
insights into the functional characteristics of 
macromolecular systems. Additionally, cryo- 
TEM is paving the way for time-resolved 
microscopy, with rapid freezing techniques 
capturing snapshots of vital structural changes 
in biological complexes. Finally, in Structural 
Cell Biology, advanced cryo-TEM, through 
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tomographic procedures, is revealing confor- 
mational changes related to the specific subcel- 
lular localization of macromolecular systems 
and their interactions within cells. 


Keywords 


Cryo-electron microscopy (cryo-TEM) - Cryo- 
electron tomography (CET) - Electron 
microscopy (EM) - Direct electron detectors 
(DED) 


13.1 Introduction 

Electron microscopy (EM) has been a widely 
used structure determination method in Biomedi- 
cal applications, but only recently it has emerged 
as a key technique for Structural Biology to 
retrieve structures at molecular and atomic reso- 
lution levels. This has been possible due to the 
recent incorporation of technical advances that 
provided the elements to overcome the classical 
limitations of this technique. The requirement to 
prepare samples for analysis under vacuum, 
which required extensive fixation procedures, as 
well as the use of heavy metal stains imposed by 
the poor specific contrast of biomaterials, was 
overcome by the incorporation of fast freezing 
for sample preparation. A second source of 
improvements came from the data acquisition 
systems, which incorporated computer-controlled 
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automation and the implementation of highly effi- 
cient direct electron detectors, which made possi- 
ble data acquisition of electron images at a high 
speed and with a significant signal to noise ratio. 
Last, but not least, the progressive improvement 
in data processing, including sophisticated data 
standardization, statistical analyses, and three- 
dimensional reconstruction protocols have 
facilitated the collection of large amounts of 
images and their optimized combination to render 
three-dimensional experimental structures. 

The incorporation of these complementary 
optimized tools dramatically changed the way 
EM was advancing during the last decades to 
recently reach an explosive “resolution revolution” 
allowing retrieving structural determination at 
atomic resolution from two-dimensional projec- 
tion electron microscopic images. The success of 
cryo-electron microscopy (cryo-TEM) has been 
acknowledged by the 2017 Nobel Prize in Chem- 
istry awarded to three main figures in the develop- 
ment of these methods for the high-resolution 
structure determination of biomolecules in solu- 
tion: J. Frank, R. Henderson and J. Dubochet. 
Deposition of structures solved by EM in the Elec- 
tron Microscopy Data Bank (EMDB at the Protein 
Data Bank Europe, https://www.ebi.ac.uk/pdbe/ 
emdb/) has increased steadily, reaching more than 
5200 in the late 2017. Up to 2014, average resolu- 
tion for these structures was relatively stable at 
1.5 nm. Nevertheless, this value exhibited a spec- 
tacular jump since then, reaching 0.8 nm in 2017. 
Even more significative, the maximum resolution 
trend has followed an impressive profile, reaching 
below 0.2 nm in 2017. 

A very interesting aspect derived from the 
EMDB deposited structures is the size distribu- 
tion of the maps: In 2016, they ranged from 
slightly less than 0.1 MDa (17) up to structures 
larger than 10 MDa (17). The larger number of 
structures solved that year corresponded to 
complexes between 1 and 10 MDa (196), with 
137 corresponding to 0.5-1 MDa, 109 to 0.25 to 
0.5 MDa and 41 to 0.1-0.25 MDa. This wide 
variety of sizes amenable to be solved by EM 
highlights one of the most interesting values of 
EM in Structural Biology: a large number of 
complexes of very different sizes can be tackled 
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by EM irrespectively of their complexity and size 
(between 0.1 and several tens of MDa). 

From the distribution of released maps at the 
EMDB as a function of the technique used, it is 
clear that a vast majority (77%) corresponds to 
structures of macromolecular complexes solved 
using single particle three-dimensional recon- 
struction methods [1]. Another 10% corresponds 
to complexes exhibiting helical symmetry or 
arranged as two-dimensional crystals, while the 
last 15% is derived from electron tomography. 
The success of single particle reconstruction in 
cryo-TEM has been exemplified by the produc- 
tion between 2014 and 2016 of more than 
200 structures with resolution better than 
0.4 nm. A critical call of attention on the 
possibilities of cryo-TEM in Structural Biology 
of macromolecules and macromolecular 
complexes was the publication in 2016 of a 
paper [2], where the structure of glutamate dehy- 
drogenase (334 KDa) was solved at 0.18 nm, thus 
breaking the 0.2 nm resolution barrier by EM. In 
the same paper, the group of S. Subramaniam 
presented the structure of isocitrate dehydroge- 
nase (with a mass of 93 kDa), demonstrating 
that cryo-TEM was able to retrieve structures for 
complexes below 100 KDa. Although the resolu- 
tion obtained in this case was 0.38 nm, it was 
sufficient to identify conformational changes 
induced by binding of allosteric small-molecule 
inhibitors. 

Although structural determination of macromo- 
lecular complexes is key for understanding their 
function, it is becoming more and more evident 
that many of their properties, including those of 
biomedical interest, might be closely related to 
their localization in the cell and to their relation- 
ship with other molecular machineries. Thus, a 
more comprehensive analysis will be progres- 
sively required, and the proteome and interactome 
will have to merge with cell structure at the highest 
possible resolution to define not only structure but 
also interactions at the physiological level. This is 
where tomography methods based on electron 
microscopy data find one most interesting applica- 
tion Cryoelectron tomography (CET) 
combines the potential of three-dimensional recon- 
struction from projections with a near native 


area. 
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preservation of biological samples. From the maps 
at the EMDB, around 15% corresponds to data 
obtained by CET. CET also benefits from the 
recent technical advances mentioned above (cryo- 
preparation, direct electron detectors, improved 
reconstruction programs) and it presently offers 
resolution levels that allow to visualize large mac- 
romolecular complexes in their functional cellular 
environments. Resolutions of 3—4 nm are standard 
in most of the CET released maps, and the imple- 
mentation of subtomogram averaging to get higher 
resolution is widely used (since 2012, resolutions 
below 1 nm are currently obtained), thus 
supporting CET promise for visualizing machines 
at work inside cells [3, 4]. 

In this chapter, we will briefly review the tech- 
nical advances which support the TEM resolution 
revolution. Then, we will highlight the main 
advantages of cryo-TEM for structural determina- 
tion of macromolecular complexes, underlying 
the possibility to study related conformations 
and structural transformations, as well as the pos- 
sibility to correlate cryo-TEM data with other 
methods (X-ray crystallography, mass spectrom- 
etry, modelling). Finally, we will briefly describe 
the bases and main applications of electron 
tomography in the context of placing macromo- 
lecular machines under physiological conditions 
in the cellular environment. 


13.2 Recent Technical Advances 


Success of cryo-TEM today is the result of a 
steady accumulation of methodological 
improvements in sample preparation, data pro- 
duction and acquisition, and image processing. 
Nevertheless, after 2012, this trend experienced 
a sudden boost (the resolution revolution). This 
change of derivative has been due to a small 
number of recent technical advances in data 
acquisition and processing which allowed to alle- 
viate one major problem related to cryo-TEM 
images: frozen, unstained biological samples are 
very sensitive to radiation damage and possess a 
very low contrast. To avoid destruction of the 
sample, very low electron dosages are used, then 
yielding very noisy images. Progressive automa- 
tion in data acquisition [5] proved to be 
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instrumental to generate large amounts of images 
required to average and reconstruct the cryo-TEM 
data. Recently, the use of phase plates [6], which 
are devices inserted in the back focal plane of the 
objective lens, have shown great potential to 
improve the contrast in images of vitrified 
samples. Even in the best imaging conditions, an 
additional bottleneck is recording the data. The 
standard use of CCDs resulted in a low efficiency, 
limiting attainable resolution from such noisy 
images. A main milestone was the incorporation 
of direct electron detectors (DEDs). These 
devices, due to their efficient recording of single 
electrons without the need of intermediate tools, 
have improved the sensitivity, quality and con- 
trast of the images, offering the possibility of 
overcoming problems derived from sample 
movement and distortion during acquisition by 
using movie-like fast frame acquisition [7]. 

As a result of automation and the use of DEDs, 
large data sets are generated in cryo-TEM. To take 
full advantage of these data, improved processing 
packages are used to generate optimized data from 
the movies generated by DEDs, standardize, clas- 
sify, and combine data from individual images to 
generate the three-dimensional volumes. A num- 
ber of different packages are currently used, each 
one with specific advantages: FREALIGN [8], 
EMAN [9], XMIPP [10], and RELION 
[11]. Also, there are suites accessing different 
packages, such as SCIPION [12]. 

Taken together, the new automated microscopes 
equipped with phase plates and DEDs, and the 
availability of improved data processing packages, 
make cryo-TEM a very powerful tool for structural 
analysis of macromolecular complexes, not only 
yielding structures, but also offering interesting 
added value for tackling dynamic processes or 
addressing conformational heterogeneities. 


13.3 Analysis of Macromolecular 
Complexes 

13.3.1 Purification Requirements 

A main advantage for using cryo-TEM analysis is 


that it is not required to have large amounts of 
sample to prepare microscope grids. Each grid 


194 


requires around 5 ul of a sample at moderate 
concentration (0.1 to 0.5 mg/ml), and although it 
is required to do several rounds of microscopy 
tests to define optimal conditions for visualization 
(selection of appropriate buffer, concentration of 
sample, removal of undesirable chemical 
contaminants), the final amount of protein 
required (several tens of micrograms) is always 
quite moderate, well below the amounts needed 
for other methods as NMR or X-ray crystallogra- 
phy (several tens of milligrams). 

Another aspect to take into account is the fact 
that cryo-TEM does not require a highly purified 
sample. This does not mean that very heteroge- 
neous samples (lysates, first clarification steps in 
purification protocols) might be suitable for direct 
study but, rather, that samples for cryo-TEM do 
not need to be 100% pure to be analyzed. This is 
so because electron microscope images show 
individual projection views of each component 
in the sample. This allows to detect and identify 
the main components in the sample 
(by correlating the images with the biochemical 
results), thus allowing to select those particles that 
represent the major biochemical component. This 
allows for addressing the study of those 
complexes that are difficult to produce and to 
purify, as it is usually the case for those which 
cannot be overexpressed ex vivo and demand 
purification from naturally existing cells and 
tissues. A comprehensive approach to the impor- 
tance of sample preparation for successful cryo- 
TEM of macromolecular complexes, and a review 
of innovative approaches to improve sample qual- 
ity is given in Chap. 12 by A. Deniaud, 
B.V. Kabasaki, J. C. Buffon and C. Schaffitzel. 


13.3.2 Structure Determination: From 
Molecular Up to Atomic 
Resolution 


Structure determination by single-particle cryo- 
TEM presents a number of advantages, besides 
the pure structure description. A clarifying exam- 
ple for the added value of cryo-TEM is that 
related to the study of CorA, the major Mg” 
uptake system in prokaryotes [13]. This 
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pentameric complex is ~200-kDa in size and it 
is a Mg”*-dependent channel which is the key 
pathway for electrophoretic Mg” uptake. X-ray 
crystallographic studies of the CorA crystals 
obtained in the presence and absence of Mg** 
did not reveal significant changes in the channel 
conformation, thus leaving open the interpreta- 
tion on how the gating mechanism could take 
place. 

The group of S. Subramaniam made a single- 
particle cryo-TEM study to determine the 
structures of detergent-solubilized CorA from 
Thermotoga maritima under conditions that sta- 
bilize its closed (Mg**-bound) and open (Mg?*- 
free) states [14]. Figure 13.1 shows the results 
obtained for the five-fold symmetric, Mg**- 
bound state. Typical cryo-TEM fields show char- 
acteristic projection views of the complexes 
(Fig. 13.la). Further analysis of these views of 
the particles reveals that they can be grouped in 
classes (Fig. 13.1b). This classification step is a 
great advantage of the method, as it enables detec- 
tion of eventual contaminants (in silico purifica- 
tion), and also provides a collection of 
two-dimensional projections of the particles 
which allows for their combination to produce a 
three-dimensional reconstruction. In this case, the 
resolution (3.8 A) was sufficient to allow tracing 
of the five polypeptide subunits, revealing 
a-helical regions, 6 sheets, and even strong 
densities for larger side chains (Fig. 13.1c). 


13.3.3 Localization of Ligands, 
Substrates and Small Molecules 


The possibilities of cryo-TEM for structure deter- 
mination are related to the achievable resolution 
limit and to whether that resolution would be 
sufficient to detect the location of small molecules 
such as ligands, substrates, etc. It is clear that the 
extension of the method towards drug investiga- 
tion in pharmaceutical studies is much dependent 
on these capabilities. The studies of da Fonseca 
on the interaction of substrates with the 
proteasome in the framework of its use to fight 
malaria are a good example on how cryo-TEM 
studies might be instrumental for the design of 
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(4) - 


Fig. 13.1 Cryo-EM of TmCorA in the Presence of 
Mg2 + figure title. (a) Representative cryo-EM image 
with CorA particles visible in different orientations is 
shown. (b) Selected 2D class averages used to produce 


improved therapeutic strategies against an impor- 
tant human disease [15]. 

The eukaryotic proteasome is a large protein 
complex of about 750 kDa, which plays a funda- 
mental role in cell homeostasis by degrading spe- 
cifically tagged proteins. It has been considered as 
a good target for different diseases, as it is 
involved in degradation of key cell cycle 
regulators. Structures of different subcomplexes 
related to the proteasome have been solved by 
X-ray crystallography. Nevertheless, the analysis 
of the interaction of inhibitors with the 
proteasome has been limited by the difficulty in 
obtaining crystals under conditions required for 


e, - (1) 


an initial model are shown. (Bottom) Top and side views 
of final map, colored according to local resolution. Slices 
through the map at indicated positions are shown on the 
right. (Reproduced with permission from [14]) 


drug interaction. This is an ideal area where cryo- 
TEM has proven to be quite appropriate, as 
samples for cryo-TEM can be prepared under 
conditions currently used for studies on drug 
interaction. The fact that fast freezing is the only 
manipulation required makes this procedure ideal 
to work under a wide variety of pH, salt and ionic 
conditions at concentrations that are typical for 
biochemical assays. 

The structure of the 20S core of the human 
proteasome was solved by cryo-TEM at about 
3.5 A resolution [16]. Figure 13.2a,b shows 
planes of the three-dimensional reconstruction of 
the core, built by seven a and seven f subunits 
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Fig. 13.2 Cryo-EM map of the human 20S- 
AdaAhx3L3VS complex. (Left, top) Individual sections 
1-A thick of the 3D map are represented as grey-scale. 
Regions showing the pattern of a-helical secondary struc- 
ture and the separation of sheet forming fP strands are 
boxed. (Left, bottom) Section of the map showing the 
global agreement between the map densities and the 
coordinates of the 20S proteasome complex. Close-up 
representations of an a-helix and a ĝ strand are shown, 
illustrating substantial recovery of side-chain information. 


arranged in pseudo-seven-fold rings, two copies 
of which stack into a barrel shaped assembly, 
revealing secondary structure elements (a-helices 
and f-strands) of the structural peptides of the 
proteasome core. When this structure was com- 
pared to the crystal structure derived from a 
mouse apo20S core, the agreement found allowed 
to completely identify the polypeptide chain 
(Fig. 13.2c) in the cryo-EM volume. Interestingly 
enough, this proteasome core was complexed 
with AdaAhx3L3VS, which is a highly potent 
proteasome inhibitor that irreversibly binds the 
20S core proteolytic active sites. The resolution 
attained by cryo-EM in the complex allowed 
locating densities corresponding to the inhibitor, 
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(Right) Structural formula of the inhibitor AdaAhx3L3VS. 
The map of the human 20S—AdaAhx3L3VS complex, 
Fourier low-pass filtered to 3.4 A, is shown as a mesh. 
Clear densities are seen extending from the N-terminal Thr 
residues of the Bs, P2 and B, subunits that are consistent 
with the -L3;VS moiety of the AdaAhx3L3VS molecule, 
with the vinyl sulfone group and the side chains of the 
three leucine residues shown clearly resolved at the fs site. 
(Modified and reproduced with permission from [16]) 


which extended out of the active sites 
(Fig. 13.2d). The possibility to detect the location 
and conformation of the inhibitor at the different 
active sites of the proteasome demonstrates that 
cryo-EM can be a most useful tool for structural 
studies of protein—ligand interactions. 


13.3.4 Study of Related 
Conformations and Snapshots 
of Dynamic Processes 


The existence of different conformational states 
of a functional complex is probably more fre- 
quent than initially expected. Most of the 
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knowledge we have accumulated on structure of 
macromolecules and macromolecular complexes 
has been derived from protein crystallography, 
which represents a single, “frozen” view of the 
possible conformational space required to get 
crystallization. One of the clear advantages of 
cryo-EM is that it does not impose any condition 
to “fix” one of the possible conformations, as all 
of them might be collected simultaneously during 
image acquisition. If a sufficient number of views 
are collected, it is possible to search the confor- 
mational space for any given complex, thus 
allowing to study not only those more frequent 
conformations, but also other less abundant ones, 
which can be also more populated under certain 
functional conditions. Obviously, this type of 
study requires to apply powerful classification 
methods to a large number of views to sort the 
specimen projections into distinct classes which, 
in turn, have to be translated into three- 
dimensional classification [11]. This added value 
for structural analysis makes cryo-EM a unique 
tool to study structural heterogeneity and its rela- 
tionship to functional states. 

A particular example of the potential of cryo- 
EM for detecting different conformations of a 
protein complex is the study of membrane 
receptors, where the existence of structural 
changes relevant to their function, together with 
the added difficulty that they are membrane 
proteins, pose a great technical challenge. The 
study by Meyerson et al. [17] of a type of 
ionotropic glutamate receptors (iGluRs), which 
are major mediators of excitatory synaptic trans- 
mission in the central nervous system, is quite 
representative. AMPA (a-amino-3-hydroxy-5- 
methyl-4-isoxazole propionic acid) receptor is 
one of the three subfamilies of these receptors, 
which have been the subject of many studies over 
the years due to their role in brain function and 
development. AMPA is a tetramer of four homol- 
ogous subunits, with an extracellular NTD 
domain, a transmembrane domain (TMD) and 
an intracellular CTD domain. The combination 
of biochemical studies and crystallographic anal- 
ysis of the isolated domains of these complexes 


197 


have resulted in working models on how the 
transition between the different states of these 
channels takes place. Nevertheless, a complete 
picture on the structural relationship among the 
different states was missing. The incorporation of 
cryo-EM to these studies has given a detailed 
view of the structural bases of the glutamate 
receptor activation and desensitization. Fig- 
ure 13.3 shows that a modified form of AMPA 
(GluA2) in a desensitized state can adopt many 
different conformations in solution (Fig. 13.3a). 
Three-dimensional classification allowed to 
define three main classes which, although at a 
relatively poor resolution (2—3 nm), revealed dif- 
ferent orientations of the ATD domains 
(Fig. 13.3b), and the comparison with the close 
and active states of the receptor revealed striking 
differences in the way both the ATD and LBD 
domains are arranged in the different states, 
including symmetry transitions. These data 
allowed to finally integrate previous biochemical 
and structural data from different origins [17]. 

Another interesting possibility derived from 
the way single-particle reconstruction is obtained 
in cryo-EM is to tackle dynamic processes by 
studying snapshots along a functional cycle. 
This sort of time-resolved microscopy does not 
only mean that samples have to be taken along a 
period of time (made possible by the way frozen 
samples are prepared) but, also, that analysis of 
the projection views within a sample where a 
reaction is taking place allows to get different 
three-dimensional states which can, in turn, be 
correlated with defined functional steps. 

A classical study reviewed by the group of 
H. Stark [18] showed how the tRNA moves in 
the ribosome during the translocation in protein 
synthesis. In this study, samples were taken by 
fast freezing at different times along the translo- 
cation reaction, taking advantage of the relative 
slow rate of certain steps. A large number of 
images were processed (more than 2 x 10°), 
yielding around 50 different three-dimensional 
structures, which were then refined to show 
8 key states (snapshots). By docking atomic 
structures of the different ribosomal components 
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Desensitized states 
Class 2 


Class 1 


Fig. 13.3 Conformational ensemble of desensitized 
GluA2 by Cryo-EM. (a) Representative desensitized 
state GluA2em 2D class averages from initial classifica- 
tion of 35,083 projection images. Selected class-averages 
that illustrate the range of observed conformations are 


into the experimental volumes obtained by cryo- 
EM, it was possible to define the movement of the 
two tRNAs moving into different positions in the 
ribosome (Fig. 13.4), also revealing the tight cou- 
pling of this tRNA movement with different con- 
formational changes in the ribosome. The use of 
cryo-EM also allows studying the behavior of 
functional complexes under different parameters, 
as temperature, pH, concentration of specific 
components, inhibitors, etc., thus offering a 
wide application spectrum with unique 
possibilities with respect to other alternative and 
complementary methods. 


Class 3 


ATD 


highlighted. (b) Segmented isosurface representations of 
three distinct desensitized state GluA2em structures, with 
the ATD and LBD layers identified in blue and orange, 
respectively. (Modified and reproduced with permission 
from [17]) 


13.4 In Situ Structural 
Determination by Electron 


Tomography 


During recent years it has become increasingly 
evident that understanding biological functions 
demands not only the study of individual cell 
components, but also to understand their 
interactions and the way they are organized in 
space and time. Cellular processes are organized 
in a precise spatial framework, and the functional 
states of the different cellular machineries are 


Fig. 13.4 Dynamics of ribosome-tRNA interactions 
and of the 30S subunit. tRNA positions in selected states 
of (retro-)translocation (left-hand boxes) and contacts with 
50S subunit regions for representative sub-states (middle 
panels) are depicted along with the actual state of the 30S 
subunit (right-hand panels) in yellow, overlaid with the 
preceding state in grey. Note the anticlockwise 30S body 


rotation from prel to preS and the switch back to the 
non-rotated conformation upon transition to postl. 30S 
landmarks are: b body, h head, n neck, pt. platform, sh 
shoulder. Protein L1 in the L1 stalk is depicted in light 
blue (no clear density for L1 in prel). (Modified and 
reproduced with permission from [18]) 
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dependent on complex interactions among them, 
which, in turn, are related to their location in the 
cellular framework. Different approaches have 
tackled this fundamental problem from diverse 
perspectives providing comprehensive insights 
into the cellular proteome and interactome. 

One key aspect of the knowledge of the organi- 
zation of the cellular machinery is the definition of 
the structural context, i.e., the cellular map of 
compartments and components at the highest pos- 
sible resolution. To this end, correlation of light 
microscopy and electron microscopy presents a 
great potential to combine the whole cell perspec- 
tive in vivo with the high intrinsic resolution of 
cryo-EM [19]. The combination of fast freezing, 
which offers the possibility to preserve the cell 
structure at near physiological level, together with 
recent technical developments to get tomographic 
data using cryo-EM, have positioned cryo-electron 
tomography as a method to bridge the structural 
analysis of the cell components with their precise 
three-dimensional mapping in the cellular land- 
scape [20]. This method is based on the fact that 
electron microscopy images are two-dimensional 
projections, from which three-dimensional infor- 
mation can be retrieved using tomographic 
procedures. Implementation of improved sample 
preparation techniques, including fast freezing 
and cryo-substitution, makes it possible to prepare 
cells and even tissues for electron microscopy 
observation. The two limitations of the method 
are the modest electron penetration in biological 
samples (up to 0.5 um) and the radiation damage 
during data acquisition. Provided suitable samples 
are prepared, electron tomographic methods can 
retrieve three-dimensional reconstructions by 
collecting series of projection views using tilting 
at different angles (reviewed in [4]). Merging of 
these projections in a tomographic reconstruction 
can be done using different methods, and the final 
volume can be analyzed and processed to extract 
and to further visualize the main structural features. 


13.4.1 Sample Preparation 


for Cryo-Electron Tomography 


Acceptable cell preservation for relatively thin 
samples can be obtained either by direct plunge 
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freezing [21] or, in the case of thicker samples, by 
more complex setups as high-pressure freezing 
[22]. In any case, the limiting factor to get cryo- 
electron tomographic reconstructions is electron 
penetration in the frozen, hydrated sample, which 
cannot be higher than 0.5 um. This limits the 
application of the method to those areas close to 
the border of the cells or, more frequently, to the 
need to obtain thin sections of the frozen sample. 
To this end, cryo-sectioning or cryo-substitution 
followed by conventional thin sectioning has 
been used with some success [22, 23]. A more 
attractive way to obtain sections of frozen cells 
with accurate thickness and without the compres- 
sion artifacts derived from mechanical sectioning 
procedures, or the requirement of using plastic 
embedding, is to produce lamellas by cryo- 
focused ion beam milling [24]. In this method 
(recently reviewed by Schaffer et al. [25]), a 
dual-beam focused ion beam machine is used at 
liquid nitrogen temperature to mill a sample using 
a gallium ion beam at defined angles. By carefully 
adjusting the operation using the scanning micro- 
scope, it is possible to produce a lamella by pro- 
gressive thinning of the frozen sample. This 
micromachining procedure yields samples where 
areas with a thickness of 100 nm can be trans- 


ferred to the cryo-EM for cryo-electron 
tomography. 

The type of information attainable using this 
approach is shown in Fig. 13.5. Cryo- 


tomographic reconstructions of frozen lamellas 
obtained from Chlamydomonas [25] show molec- 
ular complexes in the cell cytoplasm. Besides the 
overview of organelles in their native environ- 
ment (Fig. 13.5a,b), as mitochondria, 
chloroplasts, etc., a detailed inspection of the 
tomograms reveals the presence of many different 
complexes bound to cell organelles. Mitochon- 
drial membranes show bowl-shaped aggregates 
(Fig. 13.5c), and the cristae are covered by 
complexes, most likely ATP synthase dimers 
(Fig. 13.5d). Chloroplast thylakoids also show 
ATP synthase monomers, together with other 
larger complexes (Fig. 13.5e). Clathrin- and 
COPI-coated vesicles are also visible near the 
Golgi (Fig. 13.5f-h). Although the nominal reso- 
lution of these tomograms is not better than sev- 
eral nm, the possibility to detect and identify 
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Fig. 13.5 Optimized cryo-focused ion beam sample 
preparation for in situ structural studies of membrane 
proteins. Visualization of molecular complexes within 
Chlamydomonas cell FIB lamellas. (a—b) Overview slices 
from tomographic volumes. Mito mitochondrion, Chlor 
chloroplast, Ac acidocalcisome, ER endoplasmic reticu- 
lum. The framed regions in (a) and (b) correspond to the 
indicated close-up views in C-H. These magnified views 
show different tomographic slices than those depicted in 
(a) and (b, c) Circular bowl-shaped complexes 
(arrowheads), seen as semicircles in this side view, posi- 
tioned between the mitochondrion’s inner and outer 
membranes. (d) Top view of rows of ATP synthase dimers 
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bound to a crista membrane. (e) Single ATP synthases 
bound to thylakoid membranes (white arrowheads) and a 
large complex that spans the outer and inner membranes of 
the chloroplast envelope (black arrowhead). (f-g) Clathrin 
coats in the cytoplasm (f is from a region of the tomogram 
in a that is not shown). H: Vesicles near the cis-Golgi with 
complete (top), partial (bottom) or no (middle) COPI 
coats. The lamellas were 140 nm (a) and 95 nm (b) 
thick, and each covered with 60 nm of condensed water 
vapor followed by 5 nm of sputtered Pt. Tomograms were 
acquired with the VPP, a target defocus of —0.5 um, and 
an object pixel size of 0.342 nm. (Modified and 
reproduced with permission from [25]) 
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complexes within the cytoplasm of native cells is 
a major step forward in the definition of the cell 
cartography. 


13.4.2 Improvement of Tomogram 


Interpretation and Resolution 


The possibility to get structural information of 
complexes at molecular resolution from cryo- 
electron tomograms depends on different factors. 
The quality of lamella preparation, as well as the 
reproducibility of the thickness is a first factor 
which influences the resolution potentially attain- 
able in the tomograms. Taking into account the 
poor penetration power of electrons, the thicker 
the lamella the poorer signal to noise ratio it 
would yield in the tomogram. This problem 
might be alleviated by using energy filtering to 
remove inelastic scattering, although the outcome 
will depend on the low intrinsic contrast of fro- 
zen, unstained material and the relative concen- 
tration of material in the different cell areas. 
Also related to the interaction of electrons with 
the frozen material, the heavy energy transfer 
from the electron beam into the sample implies 
the use of very limited electron dosages which 
leads to very low contrast images which demand 
to use defocusing for generating phase contrast. 
The recent incorporation of Phase Plates (PP) has 
greatly improved generation of contrast and the 
production of images with increased signal to 
noise ratios. There are two main types of phase 
plates, the Zernike PP [26, 27], and the more 
recent Volta PP [28]. Although phase plates can 
also be used in data acquisition for single particle 
cryo-EM reconstruction, they show their highest 
potential in tomography, where, in combination 
with energy filters, seem to be the best option for 
automated tomographic series acquisition. 
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The full exploitation of the information con- 
tent in cryo-electron tomography requires a set of 
steps integrated in complex workflows (reviewed 
in [29]). These procedures include the use of 
specific image processing methods, which allow 
for contrast transfer function correction, 
denoising and reconstruction. Then, interpreta- 
tion of the tomographic volume requires addi- 
tional steps. Selection of the regions of interest 
is normally done by segmentation of particular 
areas or sub-volumes, which then can be used for 
searching specific macromolecular complexes. 

One particular aspect of interest is the possi- 
bility to improve the resolution attainable from 
tomographic data by averaging those structural 
components that are present in multiple, identical 
forms within the cell cytoplasm [29, 30]. To this 
end, template matching is very useful for auto- 
matic search of specific macromolecular 
complexes, using cross correlation of known 
structures with the experimental tomographic vol- 
ume. These procedures lead to a map of the 
positions of the specific complexes of interest 
within the tomogram and provide the coordinates 
of the spatial distribution of these complexes. 
From these coordinates, subtomograms 
containing each specific complex can be compu- 
tationally extracted and then averaged using the 
same rationale as that used for single particle 
reconstruction (see above). 

Subtomogram averaging is a major tool to 
expand the use of tomography to perform struc- 
tural biology in situ. The extensive use of 
improved classification and refinement methods, 
already developed for single particle reconstruc- 
tion, have led to obtaining different relevant 
complexes structures in their physiological envi- 
ronment at nanometric resolution. Examples of 
such complexes are given in Fig. 13.6, ranging 
from very large nuclear pore complexes to 


Fig. 13.6 (continued) reference stabilizes. (Bottom) 
Examples of recent structures solved by subtomogram 
averaging, shown approximately to scale. (a) Ribosomes 
on the ER membrane. (b) COPI coated vesicles. (c) The 
glycoprotein spike of HIV [11]. (d) The human nuclear 
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pore. (e) A microtubule doublet from a Chlamydomonas 
flagellum. Panels were adapted from the original 
references. (Modified and reproduced with permission 
from [30]) 
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Fig. 13.6 The potential of subtomogram averaging to 
elucidate structural biology in situ. (Top) An overview 
of subtomogram averaging. Subtomograms are extracted 
from the tomogram. They are rotationally and 


translationally aligned against a reference. The aligned 
subtomograms are then averaged to generate a new refer- 
ence. The new reference is then used for alignment of the 
subtomograms again. This procedure is repeated until the 
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relatively small viral spikes or ribosomes attached 
to the endoplasmic reticulum [30]. 

The use of nanometric resolution volumes 
obtained from tomograms can be also useful for 
fitting structural data obtained either by single 
particle reconstruction of isolated complexes 
(or some of their components), or to compare 
them with those structures obtained after purifica- 
tion. A recent study of the proteasome [31] is a 
clear example of the power of subtomogram aver- 
aging to define not only the localization of the 
26S Proteasome in situ, but also to define differ- 
ent forms of the complex (with or without sub- 
strate) which correlate their specific function to 
unique positions at the cellular level. 


13.5 Conclusions 


The successful application of new advances in 
cryo-TEM allows to retrieve structures of macro- 
molecular complexes at near atomic resolution, 
thus positioning cryo-TEM as a major tool for 
Structural Biology. Achieving this resolution 
level also provides the possibility to detect the 
presence of ligands and substrates incorporated 
under truly physiological conditions (tempera- 
ture, concentration, etc.). The incorporation of 
improved detectors and imaging devices, as 
phase plates, will generate better signal which, 
in turn, will allow to reconstruct macromolecules 
and macromolecular complexes of smaller sizes. 
The present 100 KDa barrier has already been 
broken and this might attract a great deal of inter- 
est for the application of these methods to phar- 
macological studies, extending the possibilities of 
crystallographic analyses in the Pharma industry. 

One most interesting aspect of cryo-TEM is 
derived from the fact that new cryo-TEM 
instruments and automation procedures produce 
very large data sets containing huge amounts of 
individual particles from minute amount of 
samples. Improved classification methods on 
these data sets are able to find the eventual pres- 
ence of different conformational states of the 
complexes under study. The analysis of these 
structures at atomic detail might be key to give 
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novel insights into functional features and 


relationships of macromolecular machineries. 

Cryo-TEM also offers the possibility of the 
approximation towards time-resolved micros- 
copy. Fast freezing of samples allows, in princi- 
ple, to obtain snapshots of biologically relevant 
structural changes of complexes correlated to a 
certain function. The development of new 
methods combining stop-flow and ultra-fast 
freezing could open in the near future a new 
way to approach the study of dynamic processes 
by cryo-TEM. 

The use of advanced cryo-TEM is also instru- 
mental in Structural Cell Biology. Tomographic 
procedures using the new technical possibilities 
available today are yielding new insights into the 
existence of conformational changes correlated to 
the precise subcellular localization of macromo- 
lecular machineries, and their interactions with 
other cellular components. 
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